.. _api_pca: ### PCA ### To follow this tutorial, download the FAN-C example data, for example through our `Keeper library `_. Then set up your Python session like this, loading some of our previously published `Low-C datasets `_: .. literalinclude:: code/pca_example_code.py :language: python :start-after: start snippet pca setup :end-before: end snippet pca setup PCA is one way in FAN-C to compare different Hi-C matrices to each other. A matrix of pixels vs matrices is assembled that contains the (normalised) contact strength of each matrix for the respective pixel (=region pair). PCA is then run on this matrix, and the resulting eigenvectors can be plotted to examine the variability between datasets. In FAN-C, simply use :func:`~fanc.architecture.comparisons.hic_pca` for this purpose, as shown here for chromosome 19: .. literalinclude:: code/pca_example_code.py :language: python :start-after: start snippet pca run :end-before: end snippet pca run We can plot the result using :func:`~fanc.plotting.pca_plot`: .. literalinclude:: code/pca_example_code.py :language: python :start-after: start snippet pca plot :end-before: end snippet pca plot .. image:: images/pca_default.png We can easily change the colors and markers, for example by colouring all samples with MboI and HindIII differently, and assigning different markers to samples with more or less than 1M cells: .. literalinclude:: code/pca_example_code.py :language: python :start-after: start snippet pca adjust :end-before: end snippet pca adjust .. image:: images/pca_adjust.png Sometimes the first eigenvector captures the library sequencing depth, so you may want to plot the second and third EVs instead using ``eigenvectors=(1,2)`` (in this case it does not seem to be particularly informative): .. literalinclude:: code/pca_example_code.py :language: python :start-after: start snippet pca ev :end-before: end snippet pca ev .. image:: images/pca_ev.png This kind of analysis can be tricky, and selecting informative pixels from the matrix can be key to getting a robust and intuitive PCA result. In the above example, we are using several parameters to select informative pixels for the PCA. First, we are only using pixels that are non-zero in all samples with ``ignore_zeros=True``. Second, we are sorting the pixels, listing the ones with the largest variance across samples first, using ``strategy='variance'``. Finally, we are selecting the top 100k pixels (those with the largest variance) first with ``sample_size=100000``. When you are analysing matrices of higher resolution, pixels far away from the diagonal might be dominated by noise. ``ignore_zeros`` removes most of the noisy pixels, but additionally you might want to set a ``max_distance`` to only select pixels corresponding to regions closer than this value. Similarly, if you want to exclude contacts close to the diagonal, use ``min_distance``. For more options have a look at the API reference for :func:`~fanc.plotting.pca_plot`.