Domains module¶
-
class
fanc.architecture.domains.
Boundaries
(**kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_insulation_score
(insulation_score, window_size=None, min_score=None, delta_window=3, log=False, sub_bin_precision=False, call_maxima=False, score_field='score', **kwargs)¶ Call insulation boundaries based on minima in an insulation vector of this object.
- Parameters
insulation_score –
InsulationScores
orInsulationScore
objectwindow_size – Window size in base pairs. Only necessary for
InsulationScores
objectsmin_score – Minimum difference between minimum and the closest maximum in the insulation vector for a region to be considered a boundary
delta_window – Window size in bins to control smoothing of the delta function used to calculate the derivative of the insulation index. Calculation takes into account d bins upstream and d bins downstream for a total window size of 2*d + 1 bins.
log – Log2-transform insulation index before boundary calls
sub_bin_precision – Call boundaries with sub bin precision, by taking into account the precise zero transition of the delta vector.
call_maxima – Call maxima instead of minima as boundaries
score_field –
- Returns
list of
GenomicRegion
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
scores
()¶ Return scores as list.
- Returns
list of float
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.domains.
DirectionalityIndex
(**kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
scores
()¶ Return scores as list.
- Returns
list of float
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.domains.
DirectionalityIndexes
(**kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreParameterTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_hic
(hic, window_sizes, weight_field=None, file_name=None, tmpdir=None, **kwargs)¶ Compute the directionality index for multiple window sizes.
- Parameters
hic – A compatible Hi-C object
window_sizes – A list of window sizes
weight_field – Internal. Key of the weight attribute for an edge in this object.
file_name – Path to output file. If not provided, will work in memory.
tmpdir – Optional. If
True
, will work in temporary directory until file is closed.kwargs – Keyword arguments passed on to
edges
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
property
window_sizes
¶ Get a list of window sizes in this object.
- Returns
list of window sizes (int)
-
class
-
class
fanc.architecture.domains.
InsulationScore
(**kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
scores
()¶ Return scores as list.
- Returns
list of float
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.domains.
InsulationScores
(**kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreParameterTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_hic
(hic, window_sizes, window_offset=0, file_name=None, tmpdir=None, impute_missing=False, na_threshold=0.5, normalise=True, normalisation_window=None, trim_mean_proportion=0.0, geometric_mean=False, subtract_mean=False, log=True)¶ Calculate insulation scores with multiple window sizes.
Insulation scores provide a great way to quantify the level of interactions the cross each genomic region. It is calculated by summing up (normalised) contacts in a square next to the diagonal for each genomic region. Therefore, low scores correspond to highly insulated regions with few interactions spanning them.
- Parameters
hic – A Hi-C object
window_sizes – A window size or list of window sizes used for the sliding window
window_offset – An offset of the sliding window in bins from the diagonal
file_name – Path to file where insulation scores are saved
tmpdir – Optional. If
True
, will work with file in temporary directory until it is closedimpute_missing – Will replace missing / masked values in matrix with their expected value prior to insulation score calculation
na_threshold – Fraction of missing values that is tolerated in a sliding window before the score is set to NaN
normalise – Normalise insulation score by dividing by chromosome mean or mean of a sliding window if
normalisation_window
is set.normalisation_window – If
None
(default), normalisation is performed by dividing insulation scores by the chromosome mean. You can set this to a number of bins to perform a more local normalisation using average values in a window of that sizetrim_mean_proportion – If > 0 will use a trimmed mean for normalisation trimming this fraction of scores before calculating the mean. Use this if you expect outliers in insulation scores
geometric_mean – Use a geometric mean instead of arithmetic. If using log-transformed, and if you intend to subtract scores from different samples for comparison, this is recommended
subtract_mean – For normalisation, subtract mean instead of dividing by it.
log – Log2-transform insulation scores after calculation. In the default parameters, this makes scores roughly symmetrical around 0.
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
property
window_sizes
¶ Get a list of window sizes in this object.
- Returns
list of window sizes (int)
-
class
-
class
fanc.architecture.domains.
RegionMultiScoreTable
(score_fields=None, *args, **kwargs)¶ Bases:
fanc.regions.RegionsTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.domains.
RegionScoreParameterTable
(parameter_values=None, parameter_prefix='score_', *args, **kwargs)¶ Bases:
fanc.architecture.domains.RegionMultiScoreTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
class
-
class
fanc.architecture.domains.
RegionScoreTable
(**kwargs)¶ Bases:
fanc.regions.RegionsTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
scores
()¶ Return scores as list.
- Returns
list of float
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class