Working with individual genomic regions¶
Contents
Genomic intervals, or regions, as we call them from here on, are represented
by a api/regions/GenomicRegion object. This object has attributes
commonly associated with genomic regions, such as “chromosome”, “start” and “end”,
but can in principle have arbitrary attributes, including scores, labels, and other
useful properties describing the region. There is no restriction regarding the types
of attributes - any valid Python object can be associated with a
GenomicRegion.
This tutorial assumes you have imported the genomic_regions package like this:
import genomic_regions as gr
Creating genomic regions¶
You can create a genomic region by calling the GenomicRegion constructor:
region = gr.GenomicRegion(chromosome='chr1', start=1, end=1000)
# or to simplify
region = gr.GenomicRegion('chr1', 1, 1000)
start and end must be of type int.
The strand attribute also has special restrictions. It can either be a str
("+", "-", "."), an int (-1, +1) or None.
region = gr.GenomicRegion('chr1', 1, 1000, strand='+')
Other attributes have no restrictions, but we recommend that score be a float,
to show the expected behavior when working with the region later on.
region = gr.GenomicRegion('chr1', 1, 1000, strand='+',
score=1.765, name="my region",
my_list=[1, 2, 3, 4])
You can also add attributes to the region later on by using the set_attribute
method:
region = gr.GenomicRegion('chr1', 1, 1000, strand='+')
region.set_attribute("my_dict", {'a': 1, 'b': 2})
We advise the use of set_attribute rather than the builtin setattr
region.my_dict = {'a': 1, 'b': 2}, as some processing is done to the
key, value pair by GenomicRegion for compatibility.
Finally, you can quickly create GenomicRegion objects from strings using the
as_region convenience function:
region = gr.as_region('chr1:1-1000:+')
The region string should have the format <chromosome>:<start>-<end>[:<strand>].
start and end can use common abbreviations for kilo- and megabases,
support decimal and thousand separators, and are case-insensitive,
so writing gr.as_region('chr12:12500000-18000000') is the same as
gr.as_region('chr12:12.5Mb-18Mb') and gr.as_region('chr12:12.5mb-18mb') and
gr.as_region('chr12:12,500,000-18,000,000').
Genomic region methods¶
Basic¶
The GenomicRegion object comes loaded with useful attributes and methods,
most of which are self-explanatory:
len(region) # returns the size of the region in base pairs
region.center # returns the base (or fraction of base) at the center of the region
region.five_prime # returns the starting base at the 5' end of the region
region.three_prime # returns the starting base at the 3' end of the region
region.is_forward() # True if strand is '+' or '+1'
region.is_reverse() # True if strand is '-' or '-1'
region.attributes # return all attribute names in this region object
region.copy() # return a shallow copy of this region
region.to_string() # return a region identifier string describing the region
The strand attribute returns an integer (or None, if no strand is set).
To obtain a string, use the method strand_string, which returns one of
+, -, or ..
Modifying the region¶
Some methods are provided that modify the underlying region.
region.expand changes the size of the region on the chromosome, either by an
absolute amount in base pairs (using any of the parameters absolute,
absolute_left, or absolute_right), or relative, as a fraction of the
current region size (relative, relative_left, or relative_right).
By default, these actions return a modified copy of the original region, but you can
modify the region in place using copy=True.
region = gr.as_region('chr12:12.5Mb-18Mb')
print(region) # chr12:12500000-18000000
new_region = region.expand(absolute='1mb')
print(new_region) # chr12:11500000-19000000
print(region) # chr12:12500000-18000000
region.expand(relative=1.5, copy=False)
print(region) # chr12:4250000-26250000
You can also easily move a region on the same chromosome by adding or subtracting base pairs.
region = gr.as_region('chr12:12.5Mb-18Mb')
new_region = region + 1000000
print(new_region) # chr12:13500000-19000000
Some databases store chromosome names with the ‘chr’ prefix, others without. You can use
the method fix_chromosome to switch between chromosome formats:
region = gr.as_region('chr12:12.5Mb-18Mb')
new_region = region.fix_chromosome()
print(new_region) # 12:12500000-18000000
Relationship to other regions¶
You can easily check if a region overlaps with another region:
region = gr.as_region('chr12:12.5Mb-18Mb')
region.overlaps('chr12:11Mb-13Mb') # True
region.overlaps('chr12:11Mb-11.5Mb') # False
region.overlaps('chr1:11Mb-13Mb') # False
Similarly, you can get the extent of the overlap in base pairs:
region = gr.as_region('chr12:12.5Mb-18Mb')
region.overlap('chr12:11Mb-13Mb') # 500000
region.overlap('chr12:11Mb-11.5Mb') # 0
Next steps¶
Next, we will see how to work with lists and collections of regions in Working with collections of genomic regions.