Mapping FASTQ files¶
To map FASTQ files using the API, we will use the fanc.map
module.
Mappers¶
First, we need to decide which mapper to use, and under what circumstances an unaligned
read should be truncated and re-mapped to the genome. The SimpleBowtie2Mapper
,
for example, only attempts to align a read once and does no iterative mapping. The
Bowtie2Mapper
resubmits unaligned reads and reads with a low score,
There are equivalent mappers for BWA named
SimpleBWAMapper
and BWAMapper
, but here we
choose the Bowtie2Mapper
. It requires only the path of the corresponding
bowtie2
index:
mapper = map.Bowtie2Mapper('hg19_chr18_19/hg19_chr18_19',
threads=4, min_quality=30)
The threads
parameter controls how many threads are given to each bowtie2-align
process. By using a .bam ending, the output is converted to a BAM file at the end of
mapping automatically.
Iterative mapping¶
Now we can use fanc.map.iterative_mapping()
to start the actual mapping process:
sam_folder = mkdir(os.path.join(output_folder, 'sam'))
sam_1_file = map.iterative_mapping('SRR4271982_chr18_19_1.fastq.gzip',
os.path.join(sam_folder, 'SRR4271982_1.bam'),
mapper,
threads=1, min_size=15, step_size=10,
restriction_enzyme='HindIII')
sam_2_file = map.iterative_mapping('SRR4271982_chr18_19_2.fastq.gzip',
os.path.join(sam_folder, 'SRR4271982_2.bam'),
mapper,
threads=1, min_size=15, step_size=10,
restriction_enzyme='HindIII')
Note that we are calling iterative mapping twice, independently for each FASTQ file, as
appropriate for a Hi-C experiment. min_size
determines the minimum size of a truncated
read after which it will be discarded, while step_size
determines the truncation amount.
Note
When downloading FASTQ files from SRA using SRAtools, e.g. with fastq-dump, do not
use the -I / --readids
option, which appends .1
or .2
to the read name. This
interferes with the sorting and read pairing step in FAN-C. Read names of the two mates
must be identical.
By providing a restriction_enzyme
name, we enable ligation junction splitting. Each read
will be scanned for a predicted ligation junction of the provided restriction enzyme and if
one is encountered, it will be split at the junction before alignment. This can greatly increase
alignment rates, especially for longer reads.
SAM/BAM sorting¶
After the mapping is complete, we can sort the files by read name for further processing.
from fanc.tools.files import sort_natural_sam
sorted_sam_1_file = sort_natural_sam(sam_1_file)
sorted_sam_2_file = sort_natural_sam(sam_2_file)
The above command replaces the original file with the sorted version. You can use the
output_file
parameter to output to a different file, if you prefer to keep the unsorted
version.