Generate, bin, filter, and correct Hic objects¶
After we have obtained a filtered ReadPairs object, we can easily
convert it into a Hic matrix.
hic_folder = mkdir(os.path.join(output_folder, 'hic'))
hic_file = os.path.join(hic_folder, 'example.hic')
hic = pairs.to_hic(file_name=hic_file)
Note that this only uses valid pairs for Hi-C object creation and creates fragment-level
Hi-C matrix, using the same fragment definitions as in the original ReadPairs
object.
Note
If you have multiple Hi-C objects for the same sample, for example from different replicates or sequencing runs, we recommend merging them at this stage:
from fanc.hic import Hic
merged_hic = Hic.merge([hic_rep1, hic_rep2]])
The fragment-level Hi-C objects can very easily be binned:
binned_hic = hic.bin(1000000,
file_name=os.path.join(hic_folder, 'binned_1mb.hic'),
threads=4)
Just like ReadPairs, Hic can also be filtered. Currently,
the two available filters are LowCoverageFilter and
DiagonalFilter:
from fanc.hic import LowCoverageFilter
lc_filter = LowCoverageFilter(binned_hic, rel_cutoff=0.2)
binned_hic.filter(lc_filter)
binned_hic.run_queued_filters()
LowCoverageFilterremoves all Hi-C “edges” (pixels) in regions that have low coverage. The cutoff can either be expressed in absolute numbers of pairs (cutoff) or as a fraction of the median coverage of all regions (rel_cutoff). We highly recommend filtering for low coverage before matrix balancingDiagonalFilterremoves all edges at the diagonal. Some researcher have achieved better normalisation results by setting the matrix diagonal to 0 before normalisation.
Finally, we can normalise the matrix using matrix balancing. You have the choice of either
Knight-Ruiz matrix balancing (KR, kr_balancing) or iterative correction (ICE,
ice_balancing). KR balancing is typically faster, but also consumes a lot
more memory, especially for high matrix resolutions. ICE is slow, and might not always converge
to a solution, but is much more memory efficient in its implementation.
from fanc.hic import kr_balancing
kr_balancing(binned_hic, whole_matrix=False,
restore_coverage=False)
With KR balancing, you can choose to restore the original valid pairs count using
restore_coverage=True. If this is set to False, edge weights represent ligation
probabilities rather than counts. If you set whole_matrix=True, the balancing will be done
on the whole genome matrix. The default is to do it on a per-chromosome basis.
The bias vector will be stored in the matrix, and is automatically applied when querying the object.