AB compartments¶
The following steps assume that you ran the fanc auto
command in Example analysis.
Additionally, we set up the Python session like this:
import fanc
import fanc.plotting as fancplot
import matplotlib.pyplot as plt
hic_1mb = fanc.load("output/hic/binned/fanc_example_1mb.hic")
If you want to try the tutorial with an equivalent Cooler file, load the Hi-C file like this instead:
hic_1mb = fanc.load("architecture/other-hic/fanc_example.mcool@1mb")
or like this if you want to work with a Juicer file built from the same data:
hic_1mb = fanc.load("architecture/other-hic/fanc_example.juicer.hic@1mb")
Note that there may be minor differences in the results due to the “zooming” and balancing applied by the different tools.
AB correlation matrices can very easily be obtained from Hi-C files using the
from_hic()
function:
ab = fanc.ABCompartmentMatrix.from_hic(hic_1mb)
The ab
object acts like any FAN-C matrix (see RegionMatrixContainer), which means you
can query and subset the data any way you like. For example, to get the correlation matrix
of chromosome 18:
ab_chr18 = ab.matrix(('chr18', 'chr18'))
And to visualise the matrix:
fig, ax = plt.subplots()
mp = fancplot.SquareMatrixPlot(ab, ax=ax,
norm='lin', colormap='RdBu_r',
vmin=-1, vmax=1,
draw_minor_ticks=False)
mp.plot('chr18')
plt.show()
Eigenvectors¶
The AB correlation matrix eigenvector (EV) is used to determine if a region is in the active (A) or the inactive (B) compartment. It’s calculation is very straightforward:
ev = ab.eigenvector()
eigenvector()
returns a numpy
array with one entry per region in the AB correlation matrix (you can retrieve a matching
list of regions with regions()
).
You can also retrieve only the EV entries for a specific region using the sub_region
argument, but note that the calculation is always performed on the entire genome first
to avoid biases from subsetting.
Warning
Positive EV entries do not automatically mean a region is in the A compartment. In fact, if positive or negative entries are representing the A compartment is dependent on the implementation of PCA on the platform you are using. Therefore we strongly recommend using additional biological information to determine the correspondence between EV entry sign and compartment.
One option implemented in FAN-C is to use GC content as a proxy for activity, as GC-rich regions have been shown to be associated with the active compartment. FAN-C implements the use of a genomic FASTA file, to calculate GC content and then choose the EV sign so that positive entries correspond to A, and negative entries to the B compartment.
gc_ev = ab.eigenvector(genome='hg19_chr18_19.fa', force=True)
To plot the EV, you can use LinePlot
:
fig, ax = plt.subplots(figsize=(5, 2))
lp = fancplot.LinePlot(ab, colors=['darkturquoise'])
lp.plot('chr18')
plt.show()
Enrichment profiles (Saddle plot)¶
An enrichment profile, which is used to create a saddle_plot()
is used
to show how “interactive” genomic regions belonging to the A or B compartment are. To
calculate the enrichment profile, first all genomic regions are divided into bins, according
to their EV value (the “compartment strength”). Then, we use the O/E matrix the average O/E
value between all region bins, and take the log2 of the result. Everything is summarised in
a matrix, where rows and columns correspond to the genomic region bins, and matrix entries
reflect the bins’ interactivity. Positive values reflect more than expected contacts, while
negative values reflect less than expected contacts.
In FAN-C, you can use the enrichment_profile()
function for this purpose:
profile, cutoffs = ab.enrichment_profile(hic_1mb, genome='hg19_chr18_19.fa')
enrichment_profile()
also returns
the EV cutoffs calculated from the percentiles
argument. To get a higher resolution
of the enrichment matrix, use more finely-grained percentiles.
You can use the saddle_plot()
function to plot the results:
fig, axes = fancplot.saddle_plot(profile, cutoffs)
Since the layout of the matrix and the cutoffs barplot is somewhat complex, the function
generates its own figure and axes, which for the return values. You can, however,
specify your own axes using the axes
parameter. You need to supply three axes:
one for the matrix, one for the barplot, and one for the colorbar. This allows you to
integrate the saddle plot into more complex figures. If you supply None
as any
of the axes, the corresponding plot will not be generated.