Comparisons module¶
-
class
fanc.architecture.comparisons.
ComparisonMatrix
(*args, **kwargs)¶ Bases:
fanc.matrix.RegionMatrixTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
MaskDescription
¶ Bases:
tables.description.IsDescription
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
- Parameters
name (str) – name of the mask
description (str) – description of the mask
- Returns
id of the mask
- Return type
int
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(weight1, weight2)¶ Compare two edge weights.
- Parameters
weight1 – float
weight2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
- Parameters
n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
file_name – Output file name for down-sampled object.
- Returns
RegionPairsTable
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
- Parameters
attribute – Name of the attribute, e.g. “weight”
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
iterator over edge attribute
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.- Parameters
edge_filter – Class implementing
MaskFilter
.queue – If True, filter will be queued and can be executed along with other queued filters using
run_queued_filters()
log_progress – If true, process iterating through all edges will be continuously reported.
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
- Parameters
silent – do not print flush progress
update_mappability – After writing data, update mappability and expected values
-
classmethod
from_matrices
(matrix1, matrix2, file_name=None, tmpdir=None, log_cmp=False, ignore_infinite=True, ignore_zeros=False, scale=True, **kwargs)¶ Create a comparison matrix from two compatible matrix objects.
The resulting object can be treated like any other matrix in FAN-C, offering the same convenience functions for regions and edges.
- Parameters
matrix1 – First matrix object, such as a Hi-C matrix
matrix2 – Second matrix object, such as a Hi-C matrix
file_name – Path to the comparison output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog_cmp – If
True
, log2-transform the comparison matrix value after the comparison has been performed. Useful, for example, for fold-change matricesignore_infinite – If
True
, will remove infinite values from the final comparison matrixignore_zeros – If
True
, will only compare edge weights when both are non-zero.scale – Scale matrices to the same sequencing depth (sum of all edge weights) before the comparison. You can set this to
False
if you know the type of normalisation you performed already takes care of this.kwargs – Keyword arguments passed to
edges()
- Returns
-
get_mask
(key)¶ Search _mask table for key and return Mask.
- Parameters
key (int) – search by mask name
key – search by mask ID
- Returns
Mask
-
get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
- Parameters
ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
- Returns
list of Masks extracted from ix
- Return type
list (Mask)
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.- Parameters
masked – Use a numpy masked array to mask entries corresponding to unmappable regions
kwargs – Keyword arguments passed to
edges()
-
matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.- Parameters
key – Matrix selector. See
edges()
for all supported key typeslog – If True, log-transform the matrix entries. Also see log_base
log_base – Base of the log transformation. Default: 2; only used when log=True
default_value – (optional) set the default value of matrix entries that have no associated edge/contact
mask – If False, do not mask unmappable regions
args – Positional arguments passed to
regions_and_matrix_entries()
kwargs – Keyword arguments passed to
regions_and_matrix_entries()
- Returns
-
classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
- Parameters
matrices – list of
RegionMatrixContainer
- Returns
merged
RegionMatrixContainer
-
possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
- Returns
possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
- Parameters
key – Edge selector, see
edges()
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over edges
-
regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
- Parameters
key – Edge key, see
edges()
oe – If True, will divide observed values by their expected value at the given distance. False by default
oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the
_default_score_field
attribute of the matrix class.args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over (i, j, weight) tuples
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters
(log_progress=True)¶ Run queued filters.
- Parameters
log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
- Parameters
matrix – A
Hic
objectweight_column – Name of the column to calculate the scaling factor on
- Returns
float
-
subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
- Parameters
regions – string or GenomicRegion object(s)
kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to
edges()
- Returns
Hic
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
ComparisonRegions
(*args, **kwargs)¶ Bases:
fanc.regions.RegionsTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_regions
(region_based1, region_based2, attribute='score', file_name=None, tmpdir=None, log=False, score_field=None, **kwargs)¶ Compare genomic tracks with region-associated scores.
All scores are assumed to be floats.
- Parameters
region_based1 – First
RegionBased
objectregion_based2 – Second
RegionBased
objectattribute – Name of the attribute to be compared. Typically “score”
file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – If
True
, will log2-transform values after comparisonscore_field – Name of the attribute comparison scores will be saved to. Will use
attribute
if not provided.kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
ComparisonScores
(*args, **kwargs)¶ Bases:
fanc.architecture.domains.RegionScoreParameterTable
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_scores
(scores1, scores2, attributes=None, file_name=None, tmpdir=None, log=False, field_prefix='cmp_', **kwargs)¶ Compare parameter-based scores in a
RegionScoreParameterTable
.- Parameters
scores1 – First
RegionScoreParameterTable
scores2 – Second
RegionScoreParameterTable
attributes – If
None
, will do all possible comparisons. Provide a list of region attributes (e.g. [“insulation_1000000”, “insulation_2000000”]) for specific comparisons.file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – log2-transform values after comparison
field_prefix – Prefix of the output field
kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
class
-
class
fanc.architecture.comparisons.
DifferenceMatrix
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonMatrix
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
MaskDescription
¶ Bases:
tables.description.IsDescription
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
- Parameters
name (str) – name of the mask
description (str) – description of the mask
- Returns
id of the mask
- Return type
int
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(weight1, weight2)¶ Compare two edge weights.
- Parameters
weight1 – float
weight2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
- Parameters
n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
file_name – Output file name for down-sampled object.
- Returns
RegionPairsTable
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
- Parameters
attribute – Name of the attribute, e.g. “weight”
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
iterator over edge attribute
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.- Parameters
edge_filter – Class implementing
MaskFilter
.queue – If True, filter will be queued and can be executed along with other queued filters using
run_queued_filters()
log_progress – If true, process iterating through all edges will be continuously reported.
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
- Parameters
silent – do not print flush progress
update_mappability – After writing data, update mappability and expected values
-
classmethod
from_matrices
(matrix1, matrix2, file_name=None, tmpdir=None, log_cmp=False, ignore_infinite=True, ignore_zeros=False, scale=True, **kwargs)¶ Create a comparison matrix from two compatible matrix objects.
The resulting object can be treated like any other matrix in FAN-C, offering the same convenience functions for regions and edges.
- Parameters
matrix1 – First matrix object, such as a Hi-C matrix
matrix2 – Second matrix object, such as a Hi-C matrix
file_name – Path to the comparison output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog_cmp – If
True
, log2-transform the comparison matrix value after the comparison has been performed. Useful, for example, for fold-change matricesignore_infinite – If
True
, will remove infinite values from the final comparison matrixignore_zeros – If
True
, will only compare edge weights when both are non-zero.scale – Scale matrices to the same sequencing depth (sum of all edge weights) before the comparison. You can set this to
False
if you know the type of normalisation you performed already takes care of this.kwargs – Keyword arguments passed to
edges()
- Returns
-
get_mask
(key)¶ Search _mask table for key and return Mask.
- Parameters
key (int) – search by mask name
key – search by mask ID
- Returns
Mask
-
get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
- Parameters
ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
- Returns
list of Masks extracted from ix
- Return type
list (Mask)
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.- Parameters
masked – Use a numpy masked array to mask entries corresponding to unmappable regions
kwargs – Keyword arguments passed to
edges()
-
matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.- Parameters
key – Matrix selector. See
edges()
for all supported key typeslog – If True, log-transform the matrix entries. Also see log_base
log_base – Base of the log transformation. Default: 2; only used when log=True
default_value – (optional) set the default value of matrix entries that have no associated edge/contact
mask – If False, do not mask unmappable regions
args – Positional arguments passed to
regions_and_matrix_entries()
kwargs – Keyword arguments passed to
regions_and_matrix_entries()
- Returns
-
classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
- Parameters
matrices – list of
RegionMatrixContainer
- Returns
merged
RegionMatrixContainer
-
possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
- Returns
possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
- Parameters
key – Edge selector, see
edges()
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over edges
-
regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
- Parameters
key – Edge key, see
edges()
oe – If True, will divide observed values by their expected value at the given distance. False by default
oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the
_default_score_field
attribute of the matrix class.args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over (i, j, weight) tuples
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters
(log_progress=True)¶ Run queued filters.
- Parameters
log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
- Parameters
matrix – A
Hic
objectweight_column – Name of the column to calculate the scaling factor on
- Returns
float
-
subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
- Parameters
regions – string or GenomicRegion object(s)
kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to
edges()
- Returns
Hic
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
DifferenceRegions
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonRegions
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_regions
(region_based1, region_based2, attribute='score', file_name=None, tmpdir=None, log=False, score_field=None, **kwargs)¶ Compare genomic tracks with region-associated scores.
All scores are assumed to be floats.
- Parameters
region_based1 – First
RegionBased
objectregion_based2 – Second
RegionBased
objectattribute – Name of the attribute to be compared. Typically “score”
file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – If
True
, will log2-transform values after comparisonscore_field – Name of the attribute comparison scores will be saved to. Will use
attribute
if not provided.kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
DifferenceScores
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonScores
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_scores
(scores1, scores2, attributes=None, file_name=None, tmpdir=None, log=False, field_prefix='cmp_', **kwargs)¶ Compare parameter-based scores in a
RegionScoreParameterTable
.- Parameters
scores1 – First
RegionScoreParameterTable
scores2 – Second
RegionScoreParameterTable
attributes – If
None
, will do all possible comparisons. Provide a list of region attributes (e.g. [“insulation_1000000”, “insulation_2000000”]) for specific comparisons.file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – log2-transform values after comparison
field_prefix – Prefix of the output field
kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
class
-
class
fanc.architecture.comparisons.
FoldChangeMatrix
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonMatrix
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
MaskDescription
¶ Bases:
tables.description.IsDescription
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
- Parameters
name (str) – name of the mask
description (str) – description of the mask
- Returns
id of the mask
- Return type
int
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(weight1, weight2)¶ Compare two edge weights.
- Parameters
weight1 – float
weight2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
- Parameters
n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
file_name – Output file name for down-sampled object.
- Returns
RegionPairsTable
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
- Parameters
attribute – Name of the attribute, e.g. “weight”
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
iterator over edge attribute
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.- Parameters
edge_filter – Class implementing
MaskFilter
.queue – If True, filter will be queued and can be executed along with other queued filters using
run_queued_filters()
log_progress – If true, process iterating through all edges will be continuously reported.
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
- Parameters
silent – do not print flush progress
update_mappability – After writing data, update mappability and expected values
-
classmethod
from_matrices
(matrix1, matrix2, file_name=None, tmpdir=None, log_cmp=False, ignore_infinite=True, ignore_zeros=False, scale=True, **kwargs)¶ Create a comparison matrix from two compatible matrix objects.
The resulting object can be treated like any other matrix in FAN-C, offering the same convenience functions for regions and edges.
- Parameters
matrix1 – First matrix object, such as a Hi-C matrix
matrix2 – Second matrix object, such as a Hi-C matrix
file_name – Path to the comparison output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog_cmp – If
True
, log2-transform the comparison matrix value after the comparison has been performed. Useful, for example, for fold-change matricesignore_infinite – If
True
, will remove infinite values from the final comparison matrixignore_zeros – If
True
, will only compare edge weights when both are non-zero.scale – Scale matrices to the same sequencing depth (sum of all edge weights) before the comparison. You can set this to
False
if you know the type of normalisation you performed already takes care of this.kwargs – Keyword arguments passed to
edges()
- Returns
-
get_mask
(key)¶ Search _mask table for key and return Mask.
- Parameters
key (int) – search by mask name
key – search by mask ID
- Returns
Mask
-
get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
- Parameters
ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
- Returns
list of Masks extracted from ix
- Return type
list (Mask)
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.- Parameters
masked – Use a numpy masked array to mask entries corresponding to unmappable regions
kwargs – Keyword arguments passed to
edges()
-
matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.- Parameters
key – Matrix selector. See
edges()
for all supported key typeslog – If True, log-transform the matrix entries. Also see log_base
log_base – Base of the log transformation. Default: 2; only used when log=True
default_value – (optional) set the default value of matrix entries that have no associated edge/contact
mask – If False, do not mask unmappable regions
args – Positional arguments passed to
regions_and_matrix_entries()
kwargs – Keyword arguments passed to
regions_and_matrix_entries()
- Returns
-
classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
- Parameters
matrices – list of
RegionMatrixContainer
- Returns
merged
RegionMatrixContainer
-
possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
- Returns
possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
- Parameters
key – Edge selector, see
edges()
args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over edges
-
regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
- Parameters
key – Edge key, see
edges()
oe – If True, will divide observed values by their expected value at the given distance. False by default
oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the
_default_score_field
attribute of the matrix class.args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over (i, j, weight) tuples
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters
(log_progress=True)¶ Run queued filters.
- Parameters
log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
- Parameters
matrix – A
Hic
objectweight_column – Name of the column to calculate the scaling factor on
- Returns
float
-
subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
- Parameters
regions – string or GenomicRegion object(s)
kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to
edges()
- Returns
Hic
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
FoldChangeRegions
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonRegions
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_regions
(region_based1, region_based2, attribute='score', file_name=None, tmpdir=None, log=False, score_field=None, **kwargs)¶ Compare genomic tracks with region-associated scores.
All scores are assumed to be floats.
- Parameters
region_based1 – First
RegionBased
objectregion_based2 – Second
RegionBased
objectattribute – Name of the attribute to be compared. Typically “score”
file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – If
True
, will log2-transform values after comparisonscore_field – Name of the attribute comparison scores will be saved to. Will use
attribute
if not provided.kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.architecture.comparisons.
FoldChangeScores
(*args, **kwargs)¶ Bases:
fanc.architecture.comparisons.ComparisonScores
-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
compare
(score1, score2)¶ Compare two edge weights.
- Parameters
score1 – float
score2 – float
- Returns
float
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
()¶ Write buffered data to file.
-
classmethod
from_scores
(scores1, scores2, attributes=None, file_name=None, tmpdir=None, log=False, field_prefix='cmp_', **kwargs)¶ Compare parameter-based scores in a
RegionScoreParameterTable
.- Parameters
scores1 – First
RegionScoreParameterTable
scores2 – Second
RegionScoreParameterTable
attributes – If
None
, will do all possible comparisons. Provide a list of region attributes (e.g. [“insulation_1000000”, “insulation_2000000”]) for specific comparisons.file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedlog – log2-transform values after comparison
field_prefix – Prefix of the output field
kwargs – Keyword arguments passed on to
regions()
- Returns
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
score_regions
(parameter, **kwargs)¶ Construct a new object with regions that have a
score
attribute which corresponds to scores calculated with this parameter. :param parameter: Use scores calculated with this parameter (e.g. window size) :param kwargs: Keyword arguments passed toRegionsTable
:return:RegionScoreTable
-
scores
(parameter, scores=None)¶ Return scores for a specific parameter size as list.
- Parameters
parameter – Parameter scores were calculated for, such as window size
scores – If provided, set scores for this parameter to the ones in this list.
- Returns
list of scores
-
to_bed
(file_name, parameter, subset=None)¶ Write scores to BED file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_bigwig
(file_name, parameter, subset=None)¶ Write scores to BigWig file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
to_gff
(file_name, parameter, subset=None)¶ Write scores to GFF file.
- Parameters
file_name – Path to output file
parameter – Parameter the scores were calculated for, such as window size
subset – A
GenomicRegion
or region string specifying a region range to be written to file, e.g. “chr19:1-1mb”
-
class
-
fanc.architecture.comparisons.
hic_pca
(*hics, sample_size=None, region=None, strategy='variance', scale=True, log=False, ignore_zeros=False, oe_enrichment=None, min_distance=None, max_distance=None, background_ligation=False, min_libraries_above_background=1, **kwargs)¶ Run a PCA analysis on a set of Hi-C matrices.
Note: this is not a compartment analysis. Use
ABCompartmentMatrix
for that purpose.- Parameters
hics – Two or more Hi-C objects
sample_size – Optional. Set an upper limit on the number of edges sampled for this analysis. If not specified, will use all applicable edges. Used in conjunction with
strategy
to determine which edges to prioritiseregion – Optionally specify a region string to limit the PCA to that region.
strategy – Sort order of edges. Used in conjunction with
sample_size
. One of “variance” (default), “fold-change”, or “passthrough”. variance: edges sorted by variance of contact strength across samples; fold-change: edges sorted by size of fold-change of contact strength across samples; passthrough: unsorted, edges appear in order they are stored in the objectscale – If
True
(default), the matrix values are scaled to their sequencing depth before running PCA. If you are using the default normalisation, matrix entries correspond to contact probabilities and the margins are equal to 1 and there is not Need for scaling, so you can set this toFalse
in order to save computational time.log – Log-transform contact strength prior to PCA
ignore_zeros – Only use contacts that are non-zero in all samples
oe_enrichment – Used for “fold-change”
strategy
, at least on edge must have an O/E of this value or larger.min_distance – regions must be at least this far apart (in base pairs) to be used for PCA
max_distance – regions must be at least this close together (in base pairs) to be used for PCA
background_ligation – Use the average inter-chromosomal contact strength as background ligation signal and only use pixels where at least
min_libraries_above_background
samples have an O/E signal above backgroundmin_libraries_above_background – Minimum number of libraries/samples that must have an O/E above background ligation signal for each pixel.
kwargs – Keyword arguments passed to
edges()
- Returns
sklearn PCA object, PCA result