Aggregate module

class fanc.architecture.aggregate.AggregateMatrix(file_name=None, mode='r', tmpdir=None, x=None, y=None)

Bases: fanc.general.FileGroup

Construct and store aggregate matrices from matrix-based objects.

Methods in this class can be used to generate various kinds of aggregate matrices, constructed from averaging the signal from different regions of a Hi-C (or similar) matrix. Particularly useful is the creation of aggregate matrices from observed/expected data.

Class methods control how exactly an aggregate matrix is constructed:

  • AggregateMatrix.from_center() will aggregate Hi-C matrix regions along the diagonal in a fixed window around the region center. This is useful, for example, to observe the signal around TAD boundaries or other local features, such as the start of genes, enhancer locations, …

  • AggregateMatrix.from_regions() will extract sub-matrices using regions of variable size - such as TADs - and interpolate them to the same number of pixels before aggregating them.

  • AggregateMatrix.from_center_pairs() will extract arbitrary Hi-C submatrices from a list of region pairs (representing row and column of the matrix). Each submatrix is centered on each region, and a fixed number of pixels around the center is extracted. This is used, for example, to plot aggregate matrices around loops, using the loop anchors as input.

close(copy_tmp=True, remove_tmp=True)

Close this HDF5 file and run exit operations.

If file was opened with tmpdir in read-only mode: close file and delete temporary copy.

If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.

Parameters
  • copy_tmp – If False, does not overwrite original with modified file.

  • remove_tmp – If False, does not delete temporary copy of file.

components(components=None)

Retrieve or store each individual submatrix composing the aggregate matrix.

Parameters

components – List of (masked) numpy arrays

Returns

List of (masked) numpy arrays

classmethod from_center(matrix, regions, window=200000, rescale=False, scaling_exponent=-0.25, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)

Construct an aggregate matrix from square regions along the diagonal with a fixed window size.

By default, the submatrix that is extracted from matrix is centred on the region centre and has a window size specified by window. You can change where the window will be centered using region_viewpoint, which can be any of “center”, “start”, “end”, “five_prime”, or “three_prime”. The latter two may be particularly useful for genomic features such as genes.

Example for TAD boundaries:

import fanc
hic = fanc.load("/path/to/matrix.hic")
tad_boundaries = fanc.load("/path/to/tad_boundaries.bed")

# run aggregate analysis
am = fanc.AggregateMatrix.from_center(hic, tad_boundaries.regions,
                                      window=500000)
# extract matrix when done
m = am.matrix()
Parameters
  • matrix – An object of type RegionMatrixContainer, such as a Hic matrix

  • regions – A list of GenomicRegion objects

  • window – A window size in base pairs

  • rescale – If True, will use scaling_exponent to artificially rescale the aggregate matrix values using a power law

  • scaling_exponent – The power law exponent used if rescale is True

  • keep_components – If True (default) will store each submatrix used to generate the aggregate matrix in the AggregateMatrix object, which can be retrieved using AggregateMatrix.components()

  • file_name – If provided, stores the aggregate matrix object at this location.

  • tmpdir – If True will work in temporary directory until the object is closed

  • region_viewpoint – point on which window is centred. any of “center”, “start”, “end”, “five_prime”, or “three_prime”

  • kwargs – Keyword arguments passed to extract_submatrices()

Returns

aggregate matrix

classmethod from_center_pairs(hic, pair_regions, window=None, pixels=16, keep_components=True, file_name=None, tmpdir=None, region_viewpoint='center', **kwargs)

Construct an aggregate matrix from pairs of regions.

Parameters
  • hic – A compatible Hi-C matrix

  • pair_regions – A list of region pairs

  • window – A window size in base pairs

  • pixels – The dimension (in pixels) of the output matrix

  • keep_components – Keep all submatrices that make up the aggregate matrix

  • file_name – Optional path to an output file

  • tmpdir – Optional. If True, will work in temporary directory until file is closed

  • region_viewpoint – Location in each region that is used as anchor for the extracted matrix. ‘center’ by default, also valid are ‘start’, ‘end’, ‘five_prime’, and ‘three_prime’

  • kwargs – Keyword arguments passed on to extract_submatrices()

Returns

AggregateMatrix

classmethod from_regions(hic, tad_regions, pixels=90, rescale=False, scaling_exponent=-0.25, interpolation=0, boundary_mode='reflect', keep_mask=True, absolute_extension=0, relative_extension=1.0, keep_components=True, anti_aliasing=True, file_name=None, tmpdir=None, **kwargs)

Construct aggregate matrix from variable regions along the diagonal.

For each region in tad_regions, a submatrix is extracted and interpolated so that it is exactly pixels x pixels big. You can expand each region by a relative amount using relative_extension.

Example for aggregate TADs:

import fanc
hic = fanc.load("/path/to/matrix.hic")
tads = fanc.load("/path/to/tads.bed")

# run aggregate analysis
am = fanc.AggregateMatrix.from_regions(hic, tads.regions,
                                       relative_extension=3.)
# extract matrix when done
m = am.matrix()  # 90x90 matrix with aggregate TAD in the centre
Parameters
  • hic – An object of type RegionMatrixContainer, such as a Hic matrix

  • tad_regions – A list of GenomicRegion objects

  • pixels – Number of pixels along each dimension of the aggregate matrix

  • rescale – If True, will use scaling_exponent to artificially rescale the aggregate matrix values using a power law

  • scaling_exponent – The power law exponent used if rescale is True

  • interpolation – Type of interpolation used on each submatrix in range 0-5. 0: Nearest-neighbor (default), 1: Bi-linear, 2: Bi-quadratic, 3: Bi-cubic, 4: Bi-quartic, 5: Bi-quintic

  • boundary_mode – Points outside the boundaries of the input are filled according to the given mode. Options are constant, edge, symmetrix, reflect, and warp. Affects submatrix interpolation.

  • keep_mask – If True (default) maksed Hi-C regions will also be interpolated.

  • absolute_extension – Absolute number of base pairs by which to expand each region

  • absolute_extension – Amount by which to expand each region as a fraction of each region. Values smaller than 1 lead to region shrinking

  • keep_components – If True (default) will store each submatrix used to generate the aggregate matrix in the AggregateMatrix object, which can be retrieved using AggregateMatrix.components()

  • file_name – If provided, stores the aggregate matrix object at this location.

  • tmpdir – If True will work in temporary directory until the object is closed

  • kwargs – Keyword argumnts passed to extract_submatrices()

Returns

aggregate matrix

matrix(m=None)

Retrieve or set the aggregate matrix in this object.

Parameters

m – Numpy matrix

Returns

aggregate matrix

region_pairs(pairs=None)

Retrieve or set the regions used to generate the aggregate matrix.

Parameters

pairs – Iterable of region tuples of the form [(region1, region2), (region3, region4), …]. If None, simply return the region pairs in this object.

Returns

List of region pairs [(region1, region2), (region3, region4), …].