.. _api_tads:
#######################
TADs and TAD boundaries
#######################
To follow this tutorial, download the FAN-C example data, for example through our
`Keeper library `_. Then run the
example ``fanc auto`` command in :ref:`example-fanc-auto` to generate all the
necessary files, and set up your Python session like this:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet domains setup
:end-before: end snippet domains setup
If you want to try the tutorial with an equivalent Cooler file, load the Hi-C file like
this instead:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet alternative cooler
:end-before: end snippet alternative cooler
or like this if you want to work with a Juicer file built from the same data:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet alternative juicer
:end-before: end snippet alternative juicer
Note that there may be minor differences in the results due to the "zooming" and balancing
applied by the different tools.
Also have a look at the command line documentation at :ref:`fanc-domains`, which explains
a lot of the principles behind insulation scores and directionality indexes with a couple
of helpful illustrations.
*****************
Insulation scores
*****************
Insulation scores are calculated and saved to disk using the :class:`~fanc.architecture.domains.InsulationScores`
class. To run a basic insulation score calculation on a Hi-C object, use
:class:`~fanc.architecture.domains.InsulationScores.from_hic`:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation basic
:end-before: end snippet insulation basic
The file created in ``file_name`` can later be easily loaded with :func:`~fanc.load`.
The resulting object stores all insulation score tracks for the ``window_sizes``
provided. It is derived from the :class:`~fanc.regions.RegionsTable` class, which is
:class:`~genomic_regions.RegionBased`. There is a lot of convenient functionality
bundled in these objects, which you can read about in :ref:`genomic_regions`.
You can access the calculated scores by using the
:func:`~fanc.architecture.domains.InsulationScores.scores` function, which requires
the window size of the scores you want to retrieve, and returns a list of scores,
one for each genomic region in the object:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation scores
:end-before: end snippet insulation scores
However, a more flexible way to retrieve scores, which allows things like subsetting and
retrieving region information at the same time, is the :func:`~genomic_regions.RegionBased.regions`
iterator. Scores are stored as a region attribute ``insulation_``:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation regions
:end-before: end snippet insulation regions
If you prefer to have a dedicated object for a particular window size, you can use
:func:`~fanc.architecture.domains.InsulationScores.score_regions`. The score is then
accessible via the ``score`` attribute:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation score_regions
:end-before: end snippet insulation score_regions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Insulation calculation parameters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
:class:`~fanc.architecture.domains.InsulationScores.from_hic` has several parameters that
allow you to customise the insulation score calculation. You can read everything them in the
linked API reference, and we will highlight the most important ones here.
``impute_missing`` allows you to replace missing or masked pixels with their expected values
prior to the insulation score calculation. While this avoids dealing with missing pixels,
this can be misleading in case larger regions are unmappable or have been filtered out for
some other reason, so enable this setting with caution. The related parameter
``na_threshold`` instead controls how much of a sliding window can be covered by missing
pixels before assigning it NaN by default. This is set to ``0.5`` by default - by increasing
this value you will obtain more robust scores, but also the number of NaNs will increase in
your results. Whatever value you choose, it is a tradeoff between information content and
accuracy. In most cases you should be okay with the default setting.
``normalise``, which is enabled by default, divides insulation scores by their chromosome
mean, so every score is expressed relative to the expected score on the same chromosome. By
disabling this option, you will get raw sums of weights in the sliding window. Normalisation
is stringly recommended, however. You can also normalise to a more local region instead of the
whole chromosome, which may account for local variations in contact intensity other than
insulation. For this purpose, set ``normalisation_window`` to a number of bins, for example
``300``, which is then the number of scores surrounding each region used for normalisation by
their mean.
After their computation and normalisation, insulation scores are log2-transformed. This
makes scores (roughly) symmetrical around 0. To disable this, for example if you have also
disabled ``normalise``, use ``log=False``.
The original insulation score definition uses the arithmetic mean to normalise the scores.
If you expect insulation score outliers in your data, which might affect the normalisation by
the mean, you can use a trimmed arithmetic mean instead by setting ``trim_mean_proportion``
to a value between 0 and 1. The top fraction of insulation scores is then discarded for
calculating the mean.
If you intend to compare scores from different samples (and have the ``normalise`` and
``log`` options enabled), a mathematically more accurate normalisation would be to use the
geometric mean instead. You can enable this by setting ``geometric_mean=True``.
~~~~~~~~~~~~~~~~~~~~~~~
Insulation score export
~~~~~~~~~~~~~~~~~~~~~~~
You can export scores to a human-readable file format using one of
:func:`~fanc.architecture.domains.InsulationScores.to_bed`, or
:func:`~fanc.architecture.domains.InsulationScores.to_gff`. Export to a fast BigWig track using
:func:`~fanc.architecture.domains.InsulationScores.to_bigwig`.
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation export
:end-before: end snippet insulation export
~~~~~~~~~~~~~~~~~~~~~~
Insulation score plots
~~~~~~~~~~~~~~~~~~~~~~
You can plot all calculated insulation scores using :class:`~fanc.plotting.GenomicVectorArrayPlot`:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation multiplot
:end-before: end snippet insulation multiplot
.. image:: images/domains_multi.png
If you only want to plot a single score track, use :class:`~fanc.plotting.LinePlot`:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet insulation singleplot
:end-before: end snippet insulation singleplot
.. image:: images/domains_single.png
You can read more about different plot types in :ref:`api_plot`.
**************
Boundary calls
**************
Once you have the ``insulation`` object, you can call insulating boundaries in the genome
using :class:`~fanc.architecture.domains.Boundaries`, specifically
:func:`~fanc.architecture.domains.Boundaries.from_insulation_score`:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet boundaries run
:end-before: end snippet boundaries run
This finds all minima in the insulation score vector and returns the corresponding regions
as a :class:`~fanc.regions.RegionsTable` object, which is :ref:`genomic_regions`. The
boundary strength is stored in the ``score`` attribute of each region:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet boundaries regions
:end-before: end snippet boundaries regions
By default, all minima are reported, including very weak and likely false-positive boundaries.
We recommend plotting the boundaries and their strength alongside the Hi-C matrix and then decide
on a score cutoff to only select a robust set of boundaries!
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet boundaries plot
:end-before: end snippet boundaries plot
.. image:: images/boundaries.png
You can filter out false-positives
manually like this
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet boundaries fp
:end-before: end snippet boundaries fp
or by simply using the ``min_score`` argument:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet boundaries minscore
:end-before: end snippet boundaries minscore
********************
Directionality index
********************
For a short explanation of the directionality index, please read :ref:`directionality-index`.
The principle to compute the directionality index is highly similar to the insulation score.
The dedicated class :class:`~fanc.architecture.domains.DirectionalityIndexes` handles all
relevant actions. Use :func:`~fanc.architecture.domains.DirectionalityIndexes.from_hic` to
compute it for multiple window sizes:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet directionality basic
:end-before: end snippet directionality basic
Plotting also works the same way, for all indexes:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet directionality multiplot
:end-before: end snippet directionality multiplot
.. image:: images/directionality_multi.png
And for single a single directionality index:
.. literalinclude:: code/domains_example_code.py
:language: python
:start-after: start snippet directionality singleplot
:end-before: end snippet directionality singleplot
.. image:: images/directionality_single.png