Generate, bin, filter, and correct Hic objects¶
After we have obtained a filtered ReadPairs
object, we can easily
convert it into a Hic
matrix.
hic_folder = mkdir(os.path.join(output_folder, 'hic'))
hic_file = os.path.join(hic_folder, 'example.hic')
hic = pairs.to_hic(file_name=hic_file)
Note that this only uses valid pairs for Hi-C object creation and creates fragment-level
Hi-C matrix, using the same fragment definitions as in the original ReadPairs
object.
Note
If you have multiple Hi-C objects for the same sample, for example from different replicates or sequencing runs, we recommend merging them at this stage:
from fanc.hic import Hic
merged_hic = Hic.merge([hic_rep1, hic_rep2]])
The fragment-level Hi-C objects can very easily be binned:
binned_hic = hic.bin(1000000,
file_name=os.path.join(hic_folder, 'binned_1mb.hic'),
threads=4)
Just like ReadPairs
, Hic
can also be filtered. Currently,
the two available filters are LowCoverageFilter
and
DiagonalFilter
:
from fanc.hic import LowCoverageFilter
lc_filter = LowCoverageFilter(binned_hic, rel_cutoff=0.2)
binned_hic.filter(lc_filter)
binned_hic.run_queued_filters()
LowCoverageFilter
removes all Hi-C “edges” (pixels) in regions that have low coverage. The cutoff can either be expressed in absolute numbers of pairs (cutoff
) or as a fraction of the median coverage of all regions (rel_cutoff
). We highly recommend filtering for low coverage before matrix balancingDiagonalFilter
removes all edges at the diagonal. Some researcher have achieved better normalisation results by setting the matrix diagonal to 0 before normalisation.
Finally, we can normalise the matrix using matrix balancing. You have the choice of either
Knight-Ruiz matrix balancing (KR, kr_balancing
) or iterative correction (ICE,
ice_balancing
). KR balancing is typically faster, but also consumes a lot
more memory, especially for high matrix resolutions. ICE is slow, and might not always converge
to a solution, but is much more memory efficient in its implementation.
from fanc.hic import kr_balancing
kr_balancing(binned_hic, whole_matrix=False,
restore_coverage=False)
With KR balancing, you can choose to restore the original valid pairs count using
restore_coverage=True
. If this is set to False
, edge weights represent ligation
probabilities rather than counts. If you set whole_matrix=True
, the balancing will be done
on the whole genome matrix. The default is to do it on a per-chromosome basis.
The bias vector will be stored in the matrix, and is automatically applied when querying the object.