Domains module

class fanc.architecture.domains.Boundaries(**kwargs)

Bases: fanc.architecture.domains.RegionScoreTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

classmethod from_insulation_score(insulation_score, window_size=None, min_score=None, delta_window=3, log=False, sub_bin_precision=False, call_maxima=False, score_field='score', **kwargs)

Call insulation boundaries based on minima in an insulation vector of this object.

Parameters
  • insulation_scoreInsulationScores or InsulationScore object

  • window_size – Window size in base pairs. Only necessary for InsulationScores objects

  • min_score – Minimum difference between minimum and the closest maximum in the insulation vector for a region to be considered a boundary

  • delta_window – Window size in bins to control smoothing of the delta function used to calculate the derivative of the insulation index. Calculation takes into account d bins upstream and d bins downstream for a total window size of 2*d + 1 bins.

  • log – Log2-transform insulation index before boundary calls

  • sub_bin_precision – Call boundaries with sub bin precision, by taking into account the precise zero transition of the delta vector.

  • call_maxima – Call maxima instead of minima as boundaries

  • score_field

Returns

list of GenomicRegion

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

scores()

Return scores as list.

Returns

list of float

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bed()

to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bigwig()

to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_gff()

class fanc.architecture.domains.DirectionalityIndex(**kwargs)

Bases: fanc.architecture.domains.RegionScoreTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

scores()

Return scores as list.

Returns

list of float

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bed()

to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bigwig()

to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_gff()

class fanc.architecture.domains.DirectionalityIndexes(**kwargs)

Bases: fanc.architecture.domains.RegionScoreParameterTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

classmethod from_hic(hic, window_sizes, weight_field=None, file_name=None, tmpdir=None, **kwargs)

Compute the directionality index for multiple window sizes.

Parameters
  • hic – A compatible Hi-C object

  • window_sizes – A list of window sizes

  • weight_field – Internal. Key of the weight attribute for an edge in this object.

  • file_name – Path to output file. If not provided, will work in memory.

  • tmpdir – Optional. If True, will work in temporary directory until file is closed.

  • kwargs – Keyword arguments passed on to edges

Returns

DirectionalityIndexes

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

score_regions(parameter, **kwargs)

Construct a new object with regions that have a score attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed to RegionsTable :return: RegionScoreTable

scores(parameter, scores=None)

Return scores for a specific parameter size as list.

Parameters
  • parameter – Parameter scores were calculated for, such as window size

  • scores – If provided, set scores for this parameter to the ones in this list.

Returns

list of scores

to_bed(file_name, parameter, subset=None)

Write scores to BED file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_bigwig(file_name, parameter, subset=None)

Write scores to BigWig file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_gff(file_name, parameter, subset=None)

Write scores to GFF file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

property window_sizes

Get a list of window sizes in this object.

Returns

list of window sizes (int)

class fanc.architecture.domains.InsulationScore(**kwargs)

Bases: fanc.architecture.domains.RegionScoreTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

scores()

Return scores as list.

Returns

list of float

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bed()

to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bigwig()

to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_gff()

class fanc.architecture.domains.InsulationScores(**kwargs)

Bases: fanc.architecture.domains.RegionScoreParameterTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

classmethod from_hic(hic, window_sizes, window_offset=0, file_name=None, tmpdir=None, impute_missing=False, na_threshold=0.5, normalise=True, normalisation_window=None, trim_mean_proportion=0.0, geometric_mean=False, subtract_mean=False, log=True)

Calculate insulation scores with multiple window sizes.

Insulation scores provide a great way to quantify the level of interactions the cross each genomic region. It is calculated by summing up (normalised) contacts in a square next to the diagonal for each genomic region. Therefore, low scores correspond to highly insulated regions with few interactions spanning them.

Parameters
  • hic – A Hi-C object

  • window_sizes – A window size or list of window sizes used for the sliding window

  • window_offset – An offset of the sliding window in bins from the diagonal

  • file_name – Path to file where insulation scores are saved

  • tmpdir – Optional. If True, will work with file in temporary directory until it is closed

  • impute_missing – Will replace missing / masked values in matrix with their expected value prior to insulation score calculation

  • na_threshold – Fraction of missing values that is tolerated in a sliding window before the score is set to NaN

  • normalise – Normalise insulation score by dividing by chromosome mean or mean of a sliding window if normalisation_window is set.

  • normalisation_window – If None (default), normalisation is performed by dividing insulation scores by the chromosome mean. You can set this to a number of bins to perform a more local normalisation using average values in a window of that size

  • trim_mean_proportion – If > 0 will use a trimmed mean for normalisation trimming this fraction of scores before calculating the mean. Use this if you expect outliers in insulation scores

  • geometric_mean – Use a geometric mean instead of arithmetic. If using log-transformed, and if you intend to subtract scores from different samples for comparison, this is recommended

  • subtract_mean – For normalisation, subtract mean instead of dividing by it.

  • log – Log2-transform insulation scores after calculation. In the default parameters, this makes scores roughly symmetrical around 0.

Returns

InsulationScores

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

score_regions(parameter, **kwargs)

Construct a new object with regions that have a score attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed to RegionsTable :return: RegionScoreTable

scores(parameter, scores=None)

Return scores for a specific parameter size as list.

Parameters
  • parameter – Parameter scores were calculated for, such as window size

  • scores – If provided, set scores for this parameter to the ones in this list.

Returns

list of scores

to_bed(file_name, parameter, subset=None)

Write scores to BED file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_bigwig(file_name, parameter, subset=None)

Write scores to BigWig file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_gff(file_name, parameter, subset=None)

Write scores to GFF file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

property window_sizes

Get a list of window sizes in this object.

Returns

list of window sizes (int)

class fanc.architecture.domains.RegionMultiScoreTable(score_fields=None, *args, **kwargs)

Bases: fanc.regions.RegionsTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bed()

to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bigwig()

to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_gff()

class fanc.architecture.domains.RegionScoreParameterTable(parameter_values=None, parameter_prefix='score_', *args, **kwargs)

Bases: fanc.architecture.domains.RegionMultiScoreTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

score_regions(parameter, **kwargs)

Construct a new object with regions that have a score attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed to RegionsTable :return: RegionScoreTable

scores(parameter, scores=None)

Return scores for a specific parameter size as list.

Parameters
  • parameter – Parameter scores were calculated for, such as window size

  • scores – If provided, set scores for this parameter to the ones in this list.

Returns

list of scores

to_bed(file_name, parameter, subset=None)

Write scores to BED file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_bigwig(file_name, parameter, subset=None)

Write scores to BigWig file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

to_gff(file_name, parameter, subset=None)

Write scores to GFF file.

Parameters
  • file_name – Path to output file

  • parameter – Parameter the scores were calculated for, such as window size

  • subset – A GenomicRegion or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”

class fanc.architecture.domains.RegionScoreTable(**kwargs)

Bases: fanc.regions.RegionsTable

class ChromosomeDescription

Bases: tables.description.IsDescription

Description of the chromosomes in this object.

class RegionDescription

Bases: tables.description.IsDescription

Description of a genomic region for PyTables Table

add_region(region, *args, **kwargs)

Add a genomic region to this object.

This method offers some flexibility in the types of objects that can be loaded. See parameters for details.

Parameters

region – Can be a GenomicRegion, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).

add_regions(regions, *args, **kwargs)

Bulk insert multiple genomic regions.

Parameters

regions – List (or any iterator) with objects that describe a genomic region. See add_region for options.

static bin_intervals(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into a fixed number of bins.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bins – Number of bins to divide the region into

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

static bin_intervals_equidistant(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)

Bin a given set of intervals into bins with a fixed size.

Parameters
  • intervals – iterator of tuples (start, end, score)

  • bin_size – Size of each bin in base pairs

  • interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

Returns

iterator of tuples: (start, end, score)

property bin_size

Return the length of the first region in the dataset.

Assumes all bins have equal size.

Returns

int

binned_regions(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)

Same as region_intervals, but returns GenomicRegion objects instead of tuples.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of GenomicRegion objects

bins_to_distance(bins)

Convert fraction of bins to base pairs

Parameters

bins – float, fraction of bins

Returns

int, base pairs

property chromosome_bins

Returns a dictionary of chromosomes and the start and end index of the bins they cover.

Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.

property chromosome_lengths

Returns a dictionary of chromosomes and their length in bp.

chromosomes()

List all chromosomes in this regions table. :return: list of chromosome names.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

distance_to_bins(distance)

Convert base pairs to fraction of bins.

Parameters

distance – distance in base pairs

Returns

float, distance as fraction of bin size

find_region(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)

Find the region that is at the center of a region.

Parameters

query_regions – Region selector string, :class:~GenomicRegion, or list of the former

Returns

index (or list of indexes) of the region at the center of the query region

flush()

Write buffered data to file.

intervals(*args, **kwargs)

Alias for region_intervals.

region_bins(*args, **kwargs)

Return slice of start and end indices spanned by a region.

Parameters

args – provide a GenomicRegion here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.

Returns

region_data(key, value=None)

Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced

Parameters
  • key – Name of the data column

  • value – vector with region-based data (one entry per region)

region_intervals(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)

Return equally-sized genomic intervals and associated scores.

Use either bins or bin_size argument to control binning.

Parameters
  • region – String or class:~GenomicRegion object denoting the region to be binned

  • bins – Number of bins to divide the region into

  • bin_size – Size of each bin (alternative to bins argument)

  • smoothing_window – Size of window (in bins) to smooth scores over

  • nan_replacement – NaN values in the scores will be replaced with this value

  • zero_to_nan – If True, will convert bins with score 0 to NaN

  • args – Arguments passed to _region_intervals

  • kwargs – Keyword arguments passed to _region_intervals

Returns

iterator of tuples: (start, end, score)

region_subset(region, *args, **kwargs)

Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.

Parameters

region – String or class:~GenomicRegion object for which covered bins will be returned.

property regions

Iterate over genomic regions in this object.

Will return a GenomicRegion object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.

Returns

RegionIter

property regions_dict

Return a dictionary with region index as keys and regions as values.

Returns

dict {region.ix: region, …}

scores()

Return scores as list.

Returns

list of float

to_bed(file_name, subset=None, **kwargs)

Export regions as BED file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bed()

to_bigwig(file_name, subset=None, **kwargs)

Export regions as BigWig file.

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_bigwig()

to_gff(file_name, subset=None, **kwargs)

Export regions as GFF file

Parameters
  • file_name – Path of file to write regions to

  • subset – optional GenomicRegion or str to write only regions overlapping this region

  • kwargs – Passed to write_gff()