Aggregate module¶
-
class
fanc.architecture.aggregate.
AggregateMatrix
(file_name=None, mode='r', tmpdir=None, x=None, y=None)¶ Bases:
fanc.general.FileGroup
Construct and store aggregate matrices from matrix-based objects.
Methods in this class can be used to generate various kinds of aggregate matrices, constructed from averaging the signal from different regions of a Hi-C (or similar) matrix. Particularly useful is the creation of aggregate matrices from observed/expected data.
Class methods control how exactly an aggregate matrix is constructed:
AggregateMatrix.from_center()
will aggregate Hi-C matrix regions along the diagonal in a fixed window around the region center. This is useful, for example, to observe the signal around TAD boundaries or other local features, such as the start of genes, enhancer locations, …AggregateMatrix.from_regions()
will extract sub-matrices using regions of variable size - such as TADs - and interpolate them to the same number of pixels before aggregating them.AggregateMatrix.from_center_pairs()
will extract arbitrary Hi-C submatrices from a list of region pairs (representing row and column of the matrix). Each submatrix is centered on each region, and a fixed number of pixels around the center is extracted. This is used, for example, to plot aggregate matrices around loops, using the loop anchors as input.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
components
(components=None)¶ Retrieve or store each individual submatrix composing the aggregate matrix.
- Parameters
components – List of (masked) numpy arrays
- Returns
List of (masked) numpy arrays
-
classmethod
from_center
(matrix, regions, window=200000, rescale=False, scaling_exponent=-0.25, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)¶ Construct an aggregate matrix from square regions along the diagonal with a fixed window size.
By default, the submatrix that is extracted from
matrix
is centred on the region centre and has a window size specified bywindow
. You can change where the window will be centered usingregion_viewpoint
, which can be any of “center”, “start”, “end”, “five_prime”, or “three_prime”. The latter two may be particularly useful for genomic features such as genes.Example for TAD boundaries:
import fanc hic = fanc.load("/path/to/matrix.hic") tad_boundaries = fanc.load("/path/to/tad_boundaries.bed") # run aggregate analysis am = fanc.AggregateMatrix.from_center(hic, tad_boundaries.regions, window=500000) # extract matrix when done m = am.matrix()
- Parameters
matrix – An object of type
RegionMatrixContainer
, such as a Hic matrixregions – A list of
GenomicRegion
objectswindow – A window size in base pairs
rescale – If True, will use
scaling_exponent
to artificially rescale the aggregate matrix values using a power lawscaling_exponent – The power law exponent used if
rescale
is Truekeep_components – If True (default) will store each submatrix used to generate the aggregate matrix in the
AggregateMatrix
object, which can be retrieved usingAggregateMatrix.components()
file_name – If provided, stores the aggregate matrix object at this location.
tmpdir – If True will work in temporary directory until the object is closed
region_viewpoint – point on which window is centred. any of “center”, “start”, “end”, “five_prime”, or “three_prime”
kwargs – Keyword arguments passed to
extract_submatrices()
- Returns
aggregate matrix
-
classmethod
from_center_pairs
(hic, pair_regions, window=None, pixels=16, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)¶ Construct an aggregate matrix from pairs of regions.
- Parameters
hic – A compatible Hi-C matrix
pair_regions – A list of region pairs
window – A window size in base pairs
pixels – The dimension (in pixels) of the output matrix
keep_components – Keep all submatrices that make up the aggregate matrix
file_name – Optional path to an output file
tmpdir – Optional. If
True
, will work in temporary directory until file is closedregion_viewpoint – Location in each region that is used as anchor for the extracted matrix. ‘center’ by default, also valid are ‘start’, ‘end’, ‘five_prime’, and ‘three_prime’
kwargs – Keyword arguments passed on to
extract_submatrices()
- Returns
-
classmethod
from_regions
(hic, tad_regions, pixels=90, rescale=False, scaling_exponent=-0.25, interpolation=0, boundary_mode='reflect', keep_mask=True, absolute_extension=0, relative_extension=1.0, keep_components=True, anti_aliasing=True, file_name=None, tmpdir=None, **kwargs)¶ Construct aggregate matrix from variable regions along the diagonal.
For each region in
tad_regions
, a submatrix is extracted and interpolated so that it is exactlypixels
xpixels
big. You can expand each region by a relative amount usingrelative_extension
.Example for aggregate TADs:
import fanc hic = fanc.load("/path/to/matrix.hic") tads = fanc.load("/path/to/tads.bed") # run aggregate analysis am = fanc.AggregateMatrix.from_regions(hic, tads.regions, relative_extension=3.) # extract matrix when done m = am.matrix() # 90x90 matrix with aggregate TAD in the centre
- Parameters
hic – An object of type
RegionMatrixContainer
, such as a Hic matrixtad_regions – A list of
GenomicRegion
objectspixels – Number of pixels along each dimension of the aggregate matrix
rescale – If True, will use
scaling_exponent
to artificially rescale the aggregate matrix values using a power lawscaling_exponent – The power law exponent used if
rescale
is Trueinterpolation – Type of interpolation used on each submatrix in range 0-5. 0: Nearest-neighbor (default), 1: Bi-linear, 2: Bi-quadratic, 3: Bi-cubic, 4: Bi-quartic, 5: Bi-quintic
boundary_mode – Points outside the boundaries of the input are filled according to the given mode. Options are constant, edge, symmetrix, reflect, and warp. Affects submatrix interpolation.
keep_mask – If True (default) maksed Hi-C regions will also be interpolated.
absolute_extension – Absolute number of base pairs by which to expand each region
absolute_extension – Amount by which to expand each region as a fraction of each region. Values smaller than 1 lead to region shrinking
keep_components – If True (default) will store each submatrix used to generate the aggregate matrix in the
AggregateMatrix
object, which can be retrieved usingAggregateMatrix.components()
file_name – If provided, stores the aggregate matrix object at this location.
tmpdir – If True will work in temporary directory until the object is closed
kwargs – Keyword argumnts passed to
extract_submatrices()
- Returns
aggregate matrix
-
matrix
(m=None)¶ Retrieve or set the aggregate matrix in this object.
- Parameters
m – Numpy matrix
- Returns
aggregate matrix
-
region_pairs
(pairs=None)¶ Retrieve or set the regions used to generate the aggregate matrix.
- Parameters
pairs – Iterable of region tuples of the form [(region1, region2), (region3, region4), …]. If None, simply return the region pairs in this object.
- Returns
List of region pairs [(region1, region2), (region3, region4), …].