Loop calling¶
Loops frequently form between two genomic regions, and are visible in the Hi-C matrix as patches of increased contact intensity:
fancplot -o architecture/loops/rao2014.chr11_77400000_78600000.png \
chr11:77400000-78600000 \
-p triangular architecture/loops/rao2014.chr11_77400000_78600000.hic \
-vmin 0.0 -vmax 0.05 -m 600000
We can use fanc loops
to call loops in Hi-C matrices using the HICCUPS algorithm
(Rao and Huntley et al., 2014).
Please refer to the original paper for details on the algorithm, specifically the
different types of local neighborhoods defined to make loop calling robust.
usage: fanc loops [-h] [-c CHROMOSOMES [CHROMOSOMES ...]] [-p PEAK_SIZE]
[-w WIDTH] [-t THREADS] [-d MIN_DIST]
[-m MAPPABILITY_GLOBAL_CUTOFF]
[--mappability-donut MAPPABILITY_DONUT_CUTOFF]
[--mappability-horizontal MAPPABILITY_HORIZONTAL_CUTOFF]
[--mappability-vertical MAPPABILITY_VERTICAL_CUTOFF]
[--mappability-lower-left MAPPABILITY_LOWER_LEFT_CUTOFF]
[-q FDR_GLOBAL_CUTOFF] [--fdr-donut FDR_DONUT_CUTOFF]
[--fdr-horizontal FDR_HORIZONTAL_CUTOFF]
[--fdr-vertical FDR_VERTICAL_CUTOFF]
[--fdr-lower-left FDR_LOWER_LEFT_CUTOFF]
[-e ENRICHMENT_GLOBAL_CUTOFF]
[--enrichment-donut ENRICHMENT_DONUT_CUTOFF]
[--enrichment-horizontal ENRICHMENT_HORIZONTAL_CUTOFF]
[--enrichment-vertical ENRICHMENT_VERTICAL_CUTOFF]
[--enrichment-lower-left ENRICHMENT_LOWER_LEFT_CUTOFF]
[-o OBSERVED] [--rh-filter] [--sge]
[--batch-size BATCH_SIZE] [-j]
[--merge-distance MERGE_DISTANCE] [-s] [--fdr-sum FDR_SUM]
[-b BEDPE] [-f] [-tmp]
input [output]
Positional Arguments¶
- input
Input Hic file
- output
Output file. If input file is already a FAN-C compatible loops object, filtering can also be done in place.
Named Arguments¶
- -c, --chromosomes
Chromosomes to be investigated.
- -p, --peak-size
Size of the expected peak in pixels. If not set, will be estimated to correspond to ~ 25kb.
- -w, --neighborhood-width
Width of the investigated area surrounding pixels. If not set, will be estimated at p+3
- -t, --threads
Number of threads for parallel processing. Default: 1 - it is advised to set this as high as possible, since loop calling is very computationally expensive!
- -d, --min-distance
Minimum distance in bins for two loci to be considered as loops. Default: peak size. Set this value higher to exclude loops close to the diagonal.
- -m, --mappability
Global mappability filter for all neighborhoods.Minimum mappable fraction of a pixel neighborhood to consider pixel as loop. Can be overridden by filters for local neighborhoods.
- --mappability-donut
Mappability filter for donut neighborhood. Value between 0 and 1.
- --mappability-horizontal
Mappability filter for horizontal neighborhood. Value between 0 and 1.
- --mappability-vertical
Mappability filter for vertical neighborhood. Value between 0 and 1.
- --mappability-lower-left
Mappability filter for lower-left neighborhood. Value between 0 and 1.
- -q, --fdr
Global FDR filter all neighborhoods. Individual neighborhood filters can override this global setting. Value between 0 and 1.
- --fdr-donut
FDR filter for donut neighborhood. Value between 0 and 1.
- --fdr-horizontal
FDR filter for horizontal neighborhood. Value between 0 and 1.
- --fdr-vertical
FDR filter for vertical neighborhood. Value between 0 and 1.
- --fdr-lower-left
FDR filter for lower-left neighborhood. Value between 0 and 1.
- -e, --enrichment
Global observed/expected filter all neighborhoods. Individual neighborhood filters can override this global setting.
- --enrichment-donut
Observed/expected enrichment filter for donut neighborhood.
- --enrichment-horizontal
Observed/expected enrichment filter for horizontal neighborhood.
- --enrichment-vertical
Observed/expected enrichment filter for vertical neighborhood.
- --enrichment-lower-left
Observed/expected enrichment filter for lower-left neighborhood.
- -o, --observed
Minimum observed value (integer, uncorrected). Default: 1
- --rh-filter
Filter peaks as in Rao, Huntley et al. (2014), Cell. It only retains peaks that are at least 2-fold enriched over either the donut or lower-left neighborhood, at least 1.5-fold enriched over the horizontal and vertical neighborhoods, at least 1.75-fold enriched over both the donut and lower-left neighborhood, and have an FDR <= 0.1 in every neighborhood
- --sge
Run on SGE cluster. This option is highly recommended if you are running “fanc loops” on a Sun/Oracle Grid Engine Cluster. The “-t” option specifies the number of SGE slots if this flag is set. The main process will then submit jobs to the grid using “gridmap” and collect the results. (https://gridmap.readthedocs.io/)
- --batch-size
Width of submatrix examined per process. Default: 200
- -j, --merge-pixels
Merge individual pixels into peaks after filtering.
- --merge-distance
Maximum distance in base pairs at which to merge two pixels. Default 20000
- -s, --remove-singlets
Remove isolated pixels after merging step.
- --fdr-sum
FDR sum filter for merged peaks. Merged peaks where the sum of donut FDR values of all constituent pixels is larger than the specified cutoff are filtered.
- -b, --bedpe
BEDPE output file. When set, merged loops will be written to this file after all filtering steps have completed.
- -f, --force-overwrite
- If the specified output file exists, it will be
overwritten without warning.
- -tmp, --work-in-tmp
Work in temporary directory
Annotating pixels for loop calling¶
The first step in HICCUPS consists of annotating each pixel with various measures related to their loop probability. The most important ones are
Enrichment over expected values in the local neighborhood
FDR of the local enrichment
Mappability of the local neighborhood
fanc loops architecture/loops/rao2014.chr11_77400000_78600000.hic \
architecture/loops/rao2014.chr11_77400000_78600000.loops \
-t 2
When run like this, fanc loops
does not actually call any loops, but merely returns
a matrix object where every pixel is annotated with the above properties. Importantly,
as this is the most computationally expensive step, you are strongly advised to choose
a large number of threads using the -t
option. Even better, if you have access to
a computational cluster running Sun/Oracle Grid Engine, fanc loops
can automatically
submit annotation jobs to the cluster if you set the --sge
flag. The -t
option
then specifies the number of jobs allowed to run in parallel instead of the number of
local threads used for multiprocessing.
By default, fanc loops
assumes a loop size of 25kb. This determines the area around
a pixel that is not included in the local neighborhood calculations. If this is chosen
too small, the neighborhood will lie within the peak region, and enrichments are going
to be lower. If this is chosen too big, the neighborhood will no longer be local. If you
have reason to believe your loops size differes from the default, you can set it explicitly
with -p
.
Similarly, the width of the neighborhood is determined as p + 3
by default. If you want
to in- or decrease the neighborhood width, use the -w
parameter. You should know,
however, that this is just a starting value, and the neighbborhood width might be increased
on demand internally.
Finally, you can control the size of the submatrices sent to each thread using the
--batch-size
parameter. The default, 200, should suit most purposes, but if your
individual jobs are taking too long, you should reduce this number.
We can now use the output object with annotated pixels for downstream processing.
Filtering annotated pixels¶
We need to apply filters to the annotated pixel object that remove all pixels with a low probability of being a loop. These filters typically consist of enrichment filters, FDR filters, and mappability filters. Additionally, there are filters for minimum distance between regions, and the minimum number of unnormalised valid pairs in a pixel.
You can either set a global enrichment filter that acts on all neighborhoods
using -e
, or choose individual thresholds for each local neighborhood with
--enrichment-donut
, --enrichment-vertical
, --enrichment-horizontal
,
and --enrichment-lower-left
. You usually want to set at least the
--enrichment-donut
cutoff to something like 2.
For FDR values, also called q-values, you can set a global filter using -q
. Control
the filtering of individual neighborhoods using --fdr-donut
, --fdr-vertical
,
--fdr-horizontal
, and --fdr-lower-left
. Typical values for each neighborhood
are around 0.1.
Mappability filters act on pixels where a certain fraction of pixels in their local
neighborhoods is unmappable. To set a global mappability cutoff for all neighborhoods,
use the -m
option. Again, local neighborhood mappability filters can be fine-tuned
using the --mappability-donut
, --mappability-vertical
, --mappability-horizontal
,
and --mappability-lower-left
options.
It is generally a good idea to filter on the minimum distance between regions to
consider forming a loop, as a lot of false positive loops will be close to the diagonal.
You can use the -d <b>
parameter to set a threshold on the minimum distance, where
b is expressed in number of bins.
In addition, we highly recommend applying a filter on the minimum number of valid pairs
in a pixel (-o
), so that false-positive loops due to noise are avoided.
Finally, we have included the filter applied by Rao and Huntley et al. in their original
HICCUPS algorithm as a convenient preset --rh-filter
. It only retains peaks that are
at least 2-fold enriched over either the donut or lower-left neighborhood, at least
1.5-fold enriched over the horizontal and vertical neighborhoods, at least 1.75-fold
enriched over both the donut and lower-left neighborhood, and have an FDR <= 0.1 in
every neighborhood.
An example command could look like this:
fanc loops architecture/loops/rao2014.chr11_77400000_78600000.loops \
architecture/loops/rao2014.chr11_77400000_78600000_filtered.loops \
--rh-filter -d 5 -o 5
This filters the vast majority of pixels in the matrix.
Merging unfiltered pixels into loops¶
Pixels that pass all filtering steps are good candidates for loops. Often, these pixels
appear in clusters, which we merge/join in this step. Pixels that do not form a cluster
are generally false-positives, so we filter them using --remove-singlets
.
fanc loops architecture/loops/rao2014.chr11_77400000_78600000_filtered.loops \
architecture/loops/rao2014.chr11_77400000_78600000_merged.loops \
-j --remove-singlets
Exporting to BEDPE¶
Finally, we can export all the merged loops to BEDPE using -b
:
fanc loops architecture/loops/rao2014.chr11_77400000_78600000_merged.loops \
-b architecture/loops/rao2014.chr11_77400000_78600000_merged.bedpe