Matrix module¶
-
class
fanc.matrix.
Edge
(source, sink, _weight_field='weight', **kwargs)¶ Bases:
object
A contact / an Edge between two genomic regions.
-
source
¶ The index of the “source” genomic region. By convention, source <= sink.
-
sink
¶ The index of the “sink” genomic region.
-
bias
¶ Bias factor obtained via normalisation of the Hi-C matrix
-
source_node
¶ The first
GenomicRegion
in this contact
-
sink_node
¶ The second
GenomicRegion
in this contact
-
-
class
fanc.matrix.
LazyEdge
(row, regions_table=None, _weight_field='weight')¶ Bases:
object
An
Edge
equivalent supporting lazy loading.-
source
¶ The index of the “source” genomic region. By convention, source <= sink.
-
sink
¶ The index of the “sink” genomic region.
-
bias
¶ Bias factor obtained via normalisation of the Hi-C matrix
-
source_node
¶ The first
GenomicRegion
in this contact
-
sink_node
¶ The second
GenomicRegion
in this contact
-
-
class
fanc.matrix.
MutableLazyEdge
(row, regions_table=None, _weight_field='weight')¶ Bases:
fanc.matrix.LazyEdge
-
update
()¶ Write changes to PyTables row to file.
-
-
class
fanc.matrix.
RegionMatrix
¶ Bases:
numpy.ma.core.MaskedArray
Subclass of
masked_array
with genomic region support.Objects of this type are returned by
matrix
.RegionMatrix
supports subsetting byGenomicRegion
and region strings of the form<chromosome>[:<start>-<end>]
.import fanc hic = fanc.load("output/hic/binned/fanc_example_1mb.hic") m = hic.matrix(('chr18', 'chr18')) type(m) # fanc.matrix.RegionMatrix m_sub = m['chr18:1-5mb', 'chr18:1-10mb'] type(m_sub) # fanc.matrix.RegionMatrix m.shape # 5, 10 m_sub.row_regions # [chr18:1-1000000, chr18:1000001-2000000, # chr18:2000001-3000000, chr18:3000001-4000000, # chr18:4000001-5000000]
If the associated row or col regions have a
False
valid
attribute, the rows/cols of the :RegionMatrix
will be masked.-
row_regions
¶ A list of regions matching the first matrix dimension
-
col_regions
¶ A list of regions matching the second matrix dimension
-
all
(axis=None, out=None, keepdims=<no value>)¶ Returns True if all elements evaluate to True.
The output array is masked where all the values along the given axis are masked: if the output would have been a scalar and that all the values are masked, then the output is masked.
Refer to numpy.all for full documentation.
See also
ndarray.all()
corresponding function for ndarrays
numpy.all()
equivalent function
Examples
>>> np.ma.array([1,2,3]).all() True >>> a = np.ma.array([1,2,3], mask=True) >>> (a.all() is np.ma.masked) True
-
anom
(axis=None, dtype=None)¶ Compute the anomalies (deviations from the arithmetic mean) along the given axis.
Returns an array of anomalies, with the same shape as the input and where the arithmetic mean is computed along the given axis.
- Parameters
axis (int, optional) – Axis over which the anomalies are taken. The default is to use the mean of the flattened array as reference.
dtype (dtype, optional) –
- Type to use in computing the variance. For arrays of integer type
the default is float32; for arrays of float types it is the same as the array type.
See also
mean()
Compute the mean of the array.
Examples
>>> a = np.ma.array([1,2,3]) >>> a.anom() masked_array(data=[-1., 0., 1.], mask=False, fill_value=1e+20)
-
any
(axis=None, out=None, keepdims=<no value>)¶ Returns True if any of the elements of a evaluate to True.
Masked values are considered as False during computation.
Refer to numpy.any for full documentation.
See also
ndarray.any()
corresponding function for ndarrays
numpy.any()
equivalent function
-
argmax
(axis=None, fill_value=None, out=None)¶ Returns array of indices of the maximum values along the given axis. Masked values are treated as if they had the value fill_value.
- Parameters
axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
fill_value ({var}, optional) – Value used to fill in the masked values. If None, the output of maximum_fill_value(self._data) is used instead.
out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
- Returns
index_array
- Return type
{integer_array}
Examples
>>> a = np.arange(6).reshape(2,3) >>> a.argmax() 5 >>> a.argmax(0) array([1, 1, 1]) >>> a.argmax(1) array([2, 2])
-
argmin
(axis=None, fill_value=None, out=None)¶ Return array of indices to the minimum values along the given axis.
- Parameters
axis ({None, integer}) – If None, the index is into the flattened array, otherwise along the specified axis
fill_value ({var}, optional) – Value used to fill in the masked values. If None, the output of minimum_fill_value(self._data) is used instead.
out ({None, array}, optional) – Array into which the result can be placed. Its type is preserved and it must be of the right shape to hold the output.
- Returns
If multi-dimension input, returns a new ndarray of indices to the minimum values along the given axis. Otherwise, returns a scalar of index to the minimum values along the given axis.
- Return type
ndarray or scalar
Examples
>>> x = np.ma.array(np.arange(4), mask=[1,1,0,0]) >>> x.shape = (2,2) >>> x masked_array( data=[[--, --], [2, 3]], mask=[[ True, True], [False, False]], fill_value=999999) >>> x.argmin(axis=0, fill_value=-1) array([0, 0]) >>> x.argmin(axis=0, fill_value=9) array([1, 1])
-
argpartition
(kth, axis=-1, kind='introselect', order=None)¶ Returns the indices that would partition this array.
Refer to numpy.argpartition for full documentation.
New in version 1.8.0.
See also
numpy.argpartition()
equivalent function
-
argsort
(axis=<no value>, kind=None, order=None, endwith=True, fill_value=None)¶ Return an ndarray of indices that sort the array along the specified axis. Masked values are filled beforehand to fill_value.
- Parameters
axis (int, optional) –
Axis along which to sort. If None, the default, the flattened array is used.
Changed in version 1.13.0: Previously, the default was documented to be -1, but that was in error. At some future date, the default will change to -1, as originally intended. Until then, the axis should be given explicitly when
arr.ndim > 1
, to avoid a FutureWarning.kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
order (list, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. Not all fields need be specified.
endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
fill_value ({var}, optional) – Value used internally for the masked values. If
fill_value
is not None, it supersedesendwith
.
- Returns
index_array – Array of indices that sort a along the specified axis. In other words,
a[index_array]
yields a sorted a.- Return type
ndarray, int
See also
MaskedArray.sort()
Describes sorting algorithms used.
lexsort()
Indirect stable sort with multiple keys.
ndarray.sort()
Inplace sort.
Notes
See sort for notes on the different sorting algorithms.
Examples
>>> a = np.ma.array([3,2,1], mask=[False, False, True]) >>> a masked_array(data=[3, 2, --], mask=[False, False, True], fill_value=999999) >>> a.argsort() array([1, 0, 2])
-
astype
(dtype, order='K', casting='unsafe', subok=True, copy=True)¶ Copy of the array, cast to a specified type.
- Parameters
dtype (str or dtype) – Typecode or data-type to which the array is cast.
order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout order of the result. ‘C’ means C order, ‘F’ means Fortran order, ‘A’ means ‘F’ order if all the arrays are Fortran contiguous, ‘C’ order otherwise, and ‘K’ means as close to the order the array elements appear in memory as possible. Default is ‘K’.
casting ({'no', 'equiv', 'safe', 'same_kind', 'unsafe'}, optional) –
Controls what kind of data casting may occur. Defaults to ‘unsafe’ for backwards compatibility.
’no’ means the data types should not be cast at all.
’equiv’ means only byte-order changes are allowed.
’safe’ means only casts which can preserve values are allowed.
’same_kind’ means only safe casts or casts within a kind, like float64 to float32, are allowed.
’unsafe’ means any data conversions may be done.
subok (bool, optional) – If True, then sub-classes will be passed-through (default), otherwise the returned array will be forced to be a base-class array.
copy (bool, optional) – By default, astype always returns a newly allocated array. If this is set to false, and the dtype, order, and subok requirements are satisfied, the input array is returned instead of a copy.
- Returns
arr_t – Unless copy is False and the other conditions for returning the input array are satisfied (see description for copy input parameter), arr_t is a new array of the same shape as the input array, with dtype, order given by dtype, order.
- Return type
ndarray
Notes
Changed in version 1.17.0: Casting between a simple data type and a structured one is possible only for “unsafe” casting. Casting to multiple fields is allowed, but casting from multiple fields is not.
Changed in version 1.9.0: Casting from numeric to string types in ‘safe’ casting mode requires that the string dtype length is long enough to store the max integer/float value converted.
- Raises
ComplexWarning – When casting from complex to float or int. To avoid this, one should use
a.real.astype(t)
.
Examples
>>> x = np.array([1, 2, 2.5]) >>> x array([1. , 2. , 2.5])
>>> x.astype(int) array([1, 2, 2])
-
base
¶ Base object if memory is from some other object.
Examples
The base of an array that owns its memory is None:
>>> x = np.array([1,2,3,4]) >>> x.base is None True
Slicing creates a view, whose memory is shared with x:
>>> y = x[2:] >>> y.base is x True
-
property
baseclass
¶ Class of the underlying data (read-only).
-
byteswap
(inplace=False)¶ Swap the bytes of the array elements
Toggle between low-endian and big-endian data representation by returning a byteswapped array, optionally swapped in-place.
- Parameters
inplace (bool, optional) – If
True
, swap bytes in-place, default isFalse
.- Returns
out – The byteswapped array. If inplace is
True
, this is a view to self.- Return type
ndarray
Examples
>>> A = np.array([1, 256, 8755], dtype=np.int16) >>> list(map(hex, A)) ['0x1', '0x100', '0x2233'] >>> A.byteswap(inplace=True) array([ 256, 1, 13090], dtype=int16) >>> list(map(hex, A)) ['0x100', '0x1', '0x3322']
Arrays of strings are not swapped
>>> A = np.array(['ceg', 'fac']) >>> A.byteswap() Traceback (most recent call last): ... UnicodeDecodeError: ...
-
choose
(choices, out=None, mode='raise')¶ Use an index array to construct a new array from a set of choices.
Refer to numpy.choose for full documentation.
See also
numpy.choose()
equivalent function
-
clip
(min=None, max=None, out=None, **kwargs)¶ Return an array whose values are limited to
[min, max]
. One of max or min must be given.Refer to numpy.clip for full documentation.
See also
numpy.clip()
equivalent function
-
compress
(condition, axis=None, out=None)¶ Return a where condition is
True
.If condition is a MaskedArray, missing values are considered as
False
.- Parameters
condition (var) – Boolean 1-d array selecting which entries to return. If len(condition) is less than the size of a along the axis, then output is truncated to length of condition array.
axis ({None, int}, optional) – Axis along which the operation must be performed.
out ({None, ndarray}, optional) – Alternative output array in which to place the result. It must have the same shape as the expected output but the type will be cast if necessary.
- Returns
result – A
MaskedArray
object.- Return type
MaskedArray
Notes
Please note the difference with
compressed()
! The output ofcompress()
has a mask, the output ofcompressed()
does not.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.compress([1, 0, 1]) masked_array(data=[1, 3], mask=[False, False], fill_value=999999)
>>> x.compress([1, 0, 1], axis=1) masked_array( data=[[1, 3], [--, --], [7, 9]], mask=[[False, False], [ True, True], [False, False]], fill_value=999999)
-
compressed
()¶ Return all the non-masked data as a 1-D array.
- Returns
data – A new ndarray holding the non-masked data is returned.
- Return type
ndarray
Notes
The result is not a MaskedArray!
Examples
>>> x = np.ma.array(np.arange(5), mask=[0]*2 + [1]*3) >>> x.compressed() array([0, 1]) >>> type(x.compressed()) <class 'numpy.ndarray'>
-
conj
()¶ Complex-conjugate all elements.
Refer to numpy.conjugate for full documentation.
See also
numpy.conjugate()
equivalent function
-
conjugate
()¶ Return the complex conjugate, element-wise.
Refer to numpy.conjugate for full documentation.
See also
numpy.conjugate()
equivalent function
-
copy
(order='C')¶ Return a copy of the array.
- Parameters
order ({'C', 'F', 'A', 'K'}, optional) – Controls the memory layout of the copy. ‘C’ means C-order, ‘F’ means F-order, ‘A’ means ‘F’ if a is Fortran contiguous, ‘C’ otherwise. ‘K’ means match the layout of a as closely as possible. (Note that this function and
numpy.copy()
are very similar, but have different default values for their order= arguments.)
See also
numpy.copy()
,numpy.copyto()
Examples
>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> y = x.copy()
>>> x.fill(0)
>>> x array([[0, 0, 0], [0, 0, 0]])
>>> y array([[1, 2, 3], [4, 5, 6]])
>>> y.flags['C_CONTIGUOUS'] True
-
count
(axis=None, keepdims=<no value>)¶ Count the non-masked elements of the array along the given axis.
- Parameters
axis (None or int or tuple of ints, optional) –
Axis or axes along which the count is performed. The default (axis = None) performs the count over all the dimensions of the input array. axis may be negative, in which case it counts from the last to the first axis.
New in version 1.10.0.
If this is a tuple of ints, the count is performed on multiple axes, instead of a single axis or all the axes as before.
keepdims (bool, optional) – If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the array.
- Returns
result – An array with the same shape as the input array, with the specified axis removed. If the array is a 0-d array, or if axis is None, a scalar is returned.
- Return type
ndarray or scalar
See also
count_masked()
Count masked elements in array or along a given axis.
Examples
>>> import numpy.ma as ma >>> a = ma.arange(6).reshape((2, 3)) >>> a[1, :] = ma.masked >>> a masked_array( data=[[0, 1, 2], [--, --, --]], mask=[[False, False, False], [ True, True, True]], fill_value=999999) >>> a.count() 3
When the axis keyword is specified an array of appropriate size is returned.
>>> a.count(axis=0) array([1, 1, 1]) >>> a.count(axis=1) array([3, 0])
-
ctypes
¶ An object to simplify the interaction of the array with the ctypes module.
This attribute creates an object that makes it easier to use arrays when calling shared libraries with the ctypes module. The returned object has, among others, data, shape, and strides attributes (see Notes below) which themselves return ctypes objects that can be used as arguments to a shared library.
- Parameters
None –
- Returns
c – Possessing attributes data, shape, strides, etc.
- Return type
Python object
See also
numpy.ctypeslib
Notes
Below are the public attributes of this object which were documented in “Guide to NumPy” (we have omitted undocumented public attributes, as well as documented private attributes):
-
_ctypes.
data
A pointer to the memory area of the array as a Python integer. This memory area may contain data that is not aligned, or not in correct byte-order. The memory area may not even be writeable. The array flags and data-type of this array should be respected when passing this attribute to arbitrary C-code to avoid trouble that can include Python crashing. User Beware! The value of this attribute is exactly the same as
self._array_interface_['data'][0]
.Note that unlike data_as, a reference will not be kept to the array: code like
ctypes.c_void_p((a + b).ctypes.data)
will result in a pointer to a deallocated array, and should be spelt(a + b).ctypes.data_as(ctypes.c_void_p)
-
_ctypes.
shape
A ctypes array of length self.ndim where the basetype is the C-integer corresponding to
dtype('p')
on this platform. This base-type could be ctypes.c_int, ctypes.c_long, or ctypes.c_longlong depending on the platform. The c_intp type is defined accordingly in numpy.ctypeslib. The ctypes array contains the shape of the underlying array.- Type
(c_intp*self.ndim)
-
_ctypes.
strides
A ctypes array of length self.ndim where the basetype is the same as for the shape attribute. This ctypes array contains the strides information from the underlying array. This strides information is important for showing how many bytes must be jumped to get to the next element in the array.
- Type
(c_intp*self.ndim)
-
_ctypes.
data_as
(obj) Return the data pointer cast to a particular c-types object. For example, calling
self._as_parameter_
is equivalent toself.data_as(ctypes.c_void_p)
. Perhaps you want to use the data as a pointer to a ctypes array of floating-point data:self.data_as(ctypes.POINTER(ctypes.c_double))
.The returned pointer will keep a reference to the array.
-
_ctypes.
shape_as
(obj) Return the shape tuple as an array of some other c-types type. For example:
self.shape_as(ctypes.c_short)
.
-
_ctypes.
strides_as
(obj) Return the strides tuple as an array of some other c-types type. For example:
self.strides_as(ctypes.c_longlong)
.
If the ctypes module is not available, then the ctypes attribute of array objects still returns something useful, but ctypes objects are not returned and errors may be raised instead. In particular, the object will still have the
as_parameter
attribute which will return an integer equal to the data attribute.Examples
>>> import ctypes >>> x array([[0, 1], [2, 3]]) >>> x.ctypes.data 30439712 >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_long)) <ctypes.LP_c_long object at 0x01F01300> >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_long)).contents c_long(0) >>> x.ctypes.data_as(ctypes.POINTER(ctypes.c_longlong)).contents c_longlong(4294967296L) >>> x.ctypes.shape <numpy.core._internal.c_long_Array_2 object at 0x01FFD580> >>> x.ctypes.shape_as(ctypes.c_long) <numpy.core._internal.c_long_Array_2 object at 0x01FCE620> >>> x.ctypes.strides <numpy.core._internal.c_long_Array_2 object at 0x01FCE620> >>> x.ctypes.strides_as(ctypes.c_longlong) <numpy.core._internal.c_longlong_Array_2 object at 0x01F01300>
-
cumprod
(axis=None, dtype=None, out=None)¶ Return the cumulative product of the array elements over the given axis.
Masked values are set to 1 internally during the computation. However, their position is saved, and the result will be masked at the same locations.
Refer to numpy.cumprod for full documentation.
Notes
The mask is lost if out is not a valid MaskedArray !
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
ndarray.cumprod()
corresponding function for ndarrays
numpy.cumprod()
equivalent function
-
cumsum
(axis=None, dtype=None, out=None)¶ Return the cumulative sum of the array elements over the given axis.
Masked values are set to 0 internally during the computation. However, their position is saved, and the result will be masked at the same locations.
Refer to numpy.cumsum for full documentation.
Notes
The mask is lost if out is not a valid
MaskedArray
!Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
ndarray.cumsum()
corresponding function for ndarrays
numpy.cumsum()
equivalent function
Examples
>>> marr = np.ma.array(np.arange(10), mask=[0,0,0,1,1,1,0,0,0,0]) >>> marr.cumsum() masked_array(data=[0, 1, 3, --, --, --, 9, 16, 24, 33], mask=[False, False, False, True, True, True, False, False, False, False], fill_value=999999)
-
property
data
¶ Returns the underlying data, as a view of the masked array.
If the underlying data is a subclass of
numpy.ndarray
, it is returned as such.>>> x = np.ma.array(np.matrix([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]]) >>> x.data matrix([[1, 2], [3, 4]])
The type of the data can be accessed through the
baseclass
attribute.
-
diagonal
(offset=0, axis1=0, axis2=1)¶ Return specified diagonals. In NumPy 1.9 the returned array is a read-only view instead of a copy as in previous NumPy versions. In a future version the read-only restriction will be removed.
Refer to
numpy.diagonal()
for full documentation.See also
numpy.diagonal()
equivalent function
-
dot
(b, out=None)¶ Masked dot product of two arrays. Note that out and strict are located in different positions than in ma.dot. In order to maintain compatibility with the functional version, it is recommended that the optional arguments be treated as keyword only. At some point that may be mandatory.
New in version 1.10.0.
- Parameters
b (masked_array_like) – Inputs array.
out (masked_array, optional) – Output argument. This must have the exact kind that would be returned if it was not used. In particular, it must have the right type, must be C-contiguous, and its dtype must be the dtype that would be returned for ma.dot(a,b). This is a performance feature. Therefore, if these conditions are not met, an exception is raised, instead of attempting to be flexible.
strict (bool, optional) –
Whether masked data are propagated (True) or set to 0 (False) for the computation. Default is False. Propagating the mask means that if a masked value appears in a row or column, the whole row or column is considered masked.
New in version 1.10.2.
See also
numpy.ma.dot()
equivalent function
-
property
dtype
¶ Data-type of the array’s elements.
- Parameters
None –
- Returns
d
- Return type
numpy dtype object
See also
numpy.dtype
Examples
>>> x array([[0, 1], [2, 3]]) >>> x.dtype dtype('int32') >>> type(x.dtype) <type 'numpy.dtype'>
-
dump
(file)¶ Dump a pickle of the array to the specified file. The array can be read back with pickle.load or numpy.load.
- Parameters
file (str or Path) –
A string naming the dump file.
Changed in version 1.17.0: pathlib.Path objects are now accepted.
-
dumps
()¶ Returns the pickle of the array as a string. pickle.loads or numpy.loads will convert the string back to an array.
- Parameters
None –
-
fill
(value)¶ Fill the array with a scalar value.
- Parameters
value (scalar) – All elements of a will be assigned this value.
Examples
>>> a = np.array([1, 2]) >>> a.fill(0) >>> a array([0, 0]) >>> a = np.empty(2) >>> a.fill(1) >>> a array([1., 1.])
-
property
fill_value
¶ The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.
Examples
>>> for dt in [np.int32, np.int64, np.float64, np.complex128]: ... np.ma.array([0, 1], dtype=dt).get_fill_value() ... 999999 999999 1e+20 (1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=-np.inf) >>> x.fill_value -inf >>> x.fill_value = np.pi >>> x.fill_value 3.1415926535897931 # may vary
Reset to default:
>>> x.fill_value = None >>> x.fill_value 1e+20
-
filled
(fill_value=None)¶ Return a copy of self, with masked values filled with a given value. However, if there are no masked values to fill, self will be returned instead as an ndarray.
- Parameters
fill_value (scalar, optional) – The value to use for invalid entries (None by default). If None, the fill_value attribute of the array is used instead.
- Returns
filled_array – A copy of
self
with invalid entries replaced by fill_value (be it the function argument or the attribute ofself
), orself
itself as an ndarray if there are no invalid entries to be replaced.- Return type
ndarray
Notes
The result is not a MaskedArray!
Examples
>>> x = np.ma.array([1,2,3,4,5], mask=[0,0,1,0,1], fill_value=-999) >>> x.filled() array([ 1, 2, -999, 4, -999]) >>> type(x.filled()) <class 'numpy.ndarray'>
Subclassing is preserved. This means that if, e.g., the data part of the masked array is a recarray, filled returns a recarray:
>>> x = np.array([(-1, 2), (-3, 4)], dtype='i8,i8').view(np.recarray) >>> m = np.ma.array(x, mask=[(True, False), (False, True)]) >>> m.filled() rec.array([(999999, 2), ( -3, 999999)], dtype=[('f0', '<i8'), ('f1', '<i8')])
-
flags
¶ Information about the memory layout of the array.
-
C_CONTIGUOUS
(C)¶ The data is in a single, C-style contiguous segment.
-
F_CONTIGUOUS
(F)¶ The data is in a single, Fortran-style contiguous segment.
-
OWNDATA
(O)¶ The array owns the memory it uses or borrows it from another object.
-
WRITEABLE
(W)¶ The data area can be written to. Setting this to False locks the data, making it read-only. A view (slice, etc.) inherits WRITEABLE from its base array at creation time, but a view of a writeable array may be subsequently locked while the base array remains writeable. (The opposite is not true, in that a view of a locked array may not be made writeable. However, currently, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it.) Attempting to change a non-writeable array raises a RuntimeError exception.
-
ALIGNED
(A)¶ The data and all elements are aligned appropriately for the hardware.
-
WRITEBACKIFCOPY
(X)¶ This array is a copy of some other array. The C-API function PyArray_ResolveWritebackIfCopy must be called before deallocating to the base array will be updated with the contents of this array.
-
UPDATEIFCOPY
(U)¶ (Deprecated, use WRITEBACKIFCOPY) This array is a copy of some other array. When this array is deallocated, the base array will be updated with the contents of this array.
-
FNC
¶ F_CONTIGUOUS and not C_CONTIGUOUS.
-
FORC
¶ F_CONTIGUOUS or C_CONTIGUOUS (one-segment test).
-
BEHAVED
(B)¶ ALIGNED and WRITEABLE.
-
CARRAY
(CA)¶ BEHAVED and C_CONTIGUOUS.
-
FARRAY
(FA)¶ BEHAVED and F_CONTIGUOUS and not C_CONTIGUOUS.
Notes
The flags object can be accessed dictionary-like (as in
a.flags['WRITEABLE']
), or by using lowercased attribute names (as ina.flags.writeable
). Short flag names are only supported in dictionary access.Only the WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED flags can be changed by the user, via direct assignment to the attribute or dictionary entry, or by calling ndarray.setflags.
The array flags cannot be set arbitrarily:
UPDATEIFCOPY can only be set
False
.WRITEBACKIFCOPY can only be set
False
.ALIGNED can only be set
True
if the data is truly aligned.WRITEABLE can only be set
True
if the array owns its own memory or the ultimate owner of the memory exposes a writeable buffer interface or is a string.
Arrays can be both C-style and Fortran-style contiguous simultaneously. This is clear for 1-dimensional arrays, but can also be true for higher dimensional arrays.
Even for contiguous arrays a stride for a given dimension
arr.strides[dim]
may be arbitrary ifarr.shape[dim] == 1
or the array has no elements. It does not generally hold thatself.strides[-1] == self.itemsize
for C-style contiguous arrays orself.strides[0] == self.itemsize
for Fortran-style contiguous arrays is true.-
-
property
flat
¶ Return a flat iterator, or set a flattened version of self to value.
-
flatten
(order='C')¶ Return a copy of the array collapsed into one dimension.
- Parameters
order ({'C', 'F', 'A', 'K'}, optional) – ‘C’ means to flatten in row-major (C-style) order. ‘F’ means to flatten in column-major (Fortran- style) order. ‘A’ means to flatten in column-major order if a is Fortran contiguous in memory, row-major order otherwise. ‘K’ means to flatten a in the order the elements occur in memory. The default is ‘C’.
- Returns
y – A copy of the input array, flattened to one dimension.
- Return type
ndarray
Examples
>>> a = np.array([[1,2], [3,4]]) >>> a.flatten() array([1, 2, 3, 4]) >>> a.flatten('F') array([1, 3, 2, 4])
-
get_fill_value
()¶ The filling value of the masked array is a scalar. When setting, None will set to a default based on the data type.
Examples
>>> for dt in [np.int32, np.int64, np.float64, np.complex128]: ... np.ma.array([0, 1], dtype=dt).get_fill_value() ... 999999 999999 1e+20 (1e+20+0j)
>>> x = np.ma.array([0, 1.], fill_value=-np.inf) >>> x.fill_value -inf >>> x.fill_value = np.pi >>> x.fill_value 3.1415926535897931 # may vary
Reset to default:
>>> x.fill_value = None >>> x.fill_value 1e+20
-
get_imag
()¶ The imaginary part of the masked array.
This property is a view on the imaginary part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False]) >>> x.imag masked_array(data=[1.0, --, 1.6], mask=[False, True, False], fill_value=1e+20)
-
get_real
()¶ The real part of the masked array.
This property is a view on the real part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False]) >>> x.real masked_array(data=[1.0, --, 3.45], mask=[False, True, False], fill_value=1e+20)
-
getfield
(dtype, offset=0)¶ Returns a field of the given array as a certain type.
A field is a view of the array data with a given data-type. The values in the view are determined by the given type and the offset into the current array in bytes. The offset needs to be such that the view dtype fits in the array dtype; for example an array of dtype complex128 has 16-byte elements. If taking a view with a 32-bit integer (4 bytes), the offset needs to be between 0 and 12 bytes.
- Parameters
dtype (str or dtype) – The data type of the view. The dtype size of the view can not be larger than that of the array itself.
offset (int) – Number of bytes to skip before beginning the element view.
Examples
>>> x = np.diag([1.+1.j]*2) >>> x[1, 1] = 2 + 4.j >>> x array([[1.+1.j, 0.+0.j], [0.+0.j, 2.+4.j]]) >>> x.getfield(np.float64) array([[1., 0.], [0., 2.]])
By choosing an offset of 8 bytes we can select the complex part of the array for our view:
>>> x.getfield(np.float64, offset=8) array([[1., 0.], [0., 4.]])
-
harden_mask
()¶ Force the mask to hard.
Whether the mask of a masked array is hard or soft is determined by its hardmask property. harden_mask sets hardmask to True.
See also
-
property
hardmask
¶ Hardness of the mask
-
ids
()¶ Return the addresses of the data and mask areas.
- Parameters
None –
Examples
>>> x = np.ma.array([1, 2, 3], mask=[0, 1, 1]) >>> x.ids() (166670640, 166659832) # may vary
If the array has no mask, the address of nomask is returned. This address is typically not close to the data in memory:
>>> x = np.ma.array([1, 2, 3]) >>> x.ids() (166691080, 3083169284L) # may vary
-
property
imag
¶ The imaginary part of the masked array.
This property is a view on the imaginary part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False]) >>> x.imag masked_array(data=[1.0, --, 1.6], mask=[False, True, False], fill_value=1e+20)
-
iscontiguous
()¶ Return a boolean indicating whether the data is contiguous.
- Parameters
None –
Examples
>>> x = np.ma.array([1, 2, 3]) >>> x.iscontiguous() True
iscontiguous returns one of the flags of the masked array:
>>> x.flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : False WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False
-
item
(*args)¶ Copy an element of an array to a standard Python scalar and return it.
- Parameters
*args (Arguments (variable number and type)) –
none: in this case, the method only works for arrays with one element (a.size == 1), which element is copied into a standard Python scalar object and returned.
int_type: this argument is interpreted as a flat index into the array, specifying which element to copy and return.
tuple of int_types: functions as does a single int_type argument, except that the argument is interpreted as an nd-index into the array.
- Returns
z – A copy of the specified element of the array as a suitable Python scalar
- Return type
Standard Python scalar object
Notes
When the data type of a is longdouble or clongdouble, item() returns a scalar array object because there is no available Python scalar that would not lose information. Void arrays return a buffer object for item(), unless fields are defined, in which case a tuple is returned.
item is very similar to a[args], except, instead of an array scalar, a standard Python scalar is returned. This can be useful for speeding up access to elements of the array and doing arithmetic on elements of the array using Python’s optimized math.
Examples
>>> np.random.seed(123) >>> x = np.random.randint(9, size=(3, 3)) >>> x array([[2, 2, 6], [1, 3, 6], [1, 0, 1]]) >>> x.item(3) 1 >>> x.item(7) 0 >>> x.item((0, 1)) 2 >>> x.item((2, 2)) 1
-
itemset
(*args)¶ Insert scalar into an array (scalar is cast to array’s dtype, if possible)
There must be at least 1 argument, and define the last argument as item. Then,
a.itemset(*args)
is equivalent to but faster thana[args] = item
. The item should be a scalar value and args must select a single item in the array a.- Parameters
*args (Arguments) – If one argument: a scalar, only used in case a is of size 1. If two arguments: the last argument is the value to be set and must be a scalar, the first argument specifies a single array element location. It is either an int or a tuple.
Notes
Compared to indexing syntax, itemset provides some speed increase for placing a scalar into a particular location in an ndarray, if you must do this. However, generally this is discouraged: among other problems, it complicates the appearance of the code. Also, when using itemset (and item) inside a loop, be sure to assign the methods to a local variable to avoid the attribute look-up at each loop iteration.
Examples
>>> np.random.seed(123) >>> x = np.random.randint(9, size=(3, 3)) >>> x array([[2, 2, 6], [1, 3, 6], [1, 0, 1]]) >>> x.itemset(4, 0) >>> x.itemset((2, 2), 9) >>> x array([[2, 2, 6], [1, 0, 6], [1, 0, 9]])
-
itemsize
¶ Length of one array element in bytes.
Examples
>>> x = np.array([1,2,3], dtype=np.float64) >>> x.itemsize 8 >>> x = np.array([1,2,3], dtype=np.complex128) >>> x.itemsize 16
-
property
mask
¶ Current mask.
-
max
(axis=None, out=None, fill_value=None, keepdims=<no value>)¶ Return the maximum along a given axis.
- Parameters
axis ({None, int}, optional) – Axis along which to operate. By default,
axis
is None and the flattened input is used.out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
fill_value ({var}, optional) – Value used to fill in the masked values. If None, use the output of maximum_fill_value().
- Returns
amax – New array holding the result. If
out
was specified,out
is returned.- Return type
array_like
See also
maximum_fill_value()
Returns the maximum filling value for a given datatype.
-
mean
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Returns the average of the array elements along given axis.
Masked entries are ignored, and result elements which are not finite will be masked.
Refer to numpy.mean for full documentation.
See also
ndarray.mean()
corresponding function for ndarrays
numpy.mean()
Equivalent function
numpy.ma.average()
Weighted average.
Examples
>>> a = np.ma.array([1,2,3], mask=[False, False, True]) >>> a masked_array(data=[1, 2, --], mask=[False, False, True], fill_value=999999) >>> a.mean() 1.5
-
min
(axis=None, out=None, fill_value=None, keepdims=<no value>)¶ Return the minimum along a given axis.
- Parameters
axis ({None, int}, optional) – Axis along which to operate. By default,
axis
is None and the flattened input is used.out (array_like, optional) – Alternative output array in which to place the result. Must be of the same shape and buffer length as the expected output.
fill_value ({var}, optional) – Value used to fill in the masked values. If None, use the output of minimum_fill_value.
- Returns
amin – New array holding the result. If
out
was specified,out
is returned.- Return type
array_like
See also
minimum_fill_value()
Returns the minimum filling value for a given datatype.
-
mini
(axis=None)¶ Return the array minimum along the specified axis.
Deprecated since version 1.13.0: This function is identical to both:
self.min(keepdims=True, axis=axis).squeeze(axis=axis)
np.ma.minimum.reduce(self, axis=axis)
Typically though,
self.min(axis=axis)
is sufficient.- Parameters
axis (int, optional) – The axis along which to find the minima. Default is None, in which case the minimum value in the whole array is returned.
- Returns
min – If axis is None, the result is a scalar. Otherwise, if axis is given and the array is at least 2-D, the result is a masked array with dimension one smaller than the array on which mini is called.
- Return type
scalar or MaskedArray
Examples
>>> x = np.ma.array(np.arange(6), mask=[0 ,1, 0, 0, 0 ,1]).reshape(3, 2) >>> x masked_array( data=[[0, --], [2, 3], [4, --]], mask=[[False, True], [False, False], [False, True]], fill_value=999999) >>> x.mini() masked_array(data=0, mask=False, fill_value=999999) >>> x.mini(axis=0) masked_array(data=[0, 3], mask=[False, False], fill_value=999999) >>> x.mini(axis=1) masked_array(data=[0, 2, 4], mask=[False, False, False], fill_value=999999)
There is a small difference between mini and min:
>>> x[:,1].mini(axis=0) masked_array(data=3, mask=False, fill_value=999999) >>> x[:,1].min(axis=0) 3
-
nbytes
¶ Total bytes consumed by the elements of the array.
Notes
Does not include memory consumed by non-element attributes of the array object.
Examples
>>> x = np.zeros((3,5,2), dtype=np.complex128) >>> x.nbytes 480 >>> np.prod(x.shape) * x.itemsize 480
-
ndim
¶ Number of array dimensions.
Examples
>>> x = np.array([1, 2, 3]) >>> x.ndim 1 >>> y = np.zeros((2, 3, 4)) >>> y.ndim 3
-
newbyteorder
(new_order='S')¶ Return the array with the same data viewed with a different byte order.
Equivalent to:
arr.view(arr.dtype.newbytorder(new_order))
Changes are also made in all fields and sub-arrays of the array data type.
- Parameters
new_order (string, optional) –
Byte order to force; a value from the byte order specifications below. new_order codes can be any of:
’S’ - swap dtype from current to opposite endian
{‘<’, ‘L’} - little endian
{‘>’, ‘B’} - big endian
{‘=’, ‘N’} - native order
{‘|’, ‘I’} - ignore (no change to byte order)
The default value (‘S’) results in swapping the current byte order. The code does a case-insensitive check on the first letter of new_order for the alternatives above. For example, any of ‘B’ or ‘b’ or ‘biggish’ are valid to specify big-endian.
- Returns
new_arr – New array object with the dtype reflecting given change to the byte order.
- Return type
array
-
nonzero
()¶ Return the indices of unmasked elements that are not zero.
Returns a tuple of arrays, one for each dimension, containing the indices of the non-zero elements in that dimension. The corresponding non-zero values can be obtained with:
a[a.nonzero()]
To group the indices by element, rather than dimension, use instead:
np.transpose(a.nonzero())
The result of this is always a 2d array, with a row for each non-zero element.
- Parameters
None –
- Returns
tuple_of_arrays – Indices of elements that are non-zero.
- Return type
tuple
See also
numpy.nonzero()
Function operating on ndarrays.
flatnonzero()
Return indices that are non-zero in the flattened version of the input array.
ndarray.nonzero()
Equivalent ndarray method.
count_nonzero()
Counts the number of non-zero elements in the input array.
Examples
>>> import numpy.ma as ma >>> x = ma.array(np.eye(3)) >>> x masked_array( data=[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], mask=False, fill_value=1e+20) >>> x.nonzero() (array([0, 1, 2]), array([0, 1, 2]))
Masked elements are ignored.
>>> x[1, 1] = ma.masked >>> x masked_array( data=[[1.0, 0.0, 0.0], [0.0, --, 0.0], [0.0, 0.0, 1.0]], mask=[[False, False, False], [False, True, False], [False, False, False]], fill_value=1e+20) >>> x.nonzero() (array([0, 2]), array([0, 2]))
Indices can also be grouped by element.
>>> np.transpose(x.nonzero()) array([[0, 0], [2, 2]])
A common use for
nonzero
is to find the indices of an array, where a condition is True. Given an array a, the condition a > 3 is a boolean array and since False is interpreted as 0, ma.nonzero(a > 3) yields the indices of the a where the condition is true.>>> a = ma.array([[1,2,3],[4,5,6],[7,8,9]]) >>> a > 3 masked_array( data=[[False, False, False], [ True, True, True], [ True, True, True]], mask=False, fill_value=True) >>> ma.nonzero(a > 3) (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
The
nonzero
method of the condition array can also be called.>>> (a > 3).nonzero() (array([1, 1, 1, 2, 2, 2]), array([0, 1, 2, 0, 1, 2]))
-
partition
(kth, axis=-1, kind='introselect', order=None)¶ Rearranges the elements in the array in such a way that the value of the element in kth position is in the position it would be in a sorted array. All elements smaller than the kth element are moved before this element and all equal or greater are moved behind it. The ordering of the elements in the two partitions is undefined.
New in version 1.8.0.
- Parameters
kth (int or sequence of ints) – Element index to partition by. The kth element value will be in its final sorted position and all smaller elements will be moved before it and all equal or greater elements behind it. The order of all elements in the partitions is undefined. If provided with a sequence of kth it will partition all elements indexed by kth of them into their sorted position at once.
axis (int, optional) – Axis along which to sort. Default is -1, which means sort along the last axis.
kind ({'introselect'}, optional) – Selection algorithm. Default is ‘introselect’.
order (str or list of str, optional) – When a is an array with fields defined, this argument specifies which fields to compare first, second, etc. A single field can be specified as a string, and not all fields need to be specified, but unspecified fields will still be used, in the order in which they come up in the dtype, to break ties.
See also
numpy.partition()
Return a parititioned copy of an array.
argpartition()
Indirect partition.
sort()
Full sort.
Notes
See
np.partition
for notes on the different algorithms.Examples
>>> a = np.array([3, 4, 2, 1]) >>> a.partition(3) >>> a array([2, 1, 3, 4])
>>> a.partition((1, 3)) >>> a array([1, 2, 3, 4])
-
prod
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the product of the array elements over the given axis.
Masked elements are set to 1 internally for computation.
Refer to numpy.prod for full documentation.
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
ndarray.prod()
corresponding function for ndarrays
numpy.prod()
equivalent function
-
product
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the product of the array elements over the given axis.
Masked elements are set to 1 internally for computation.
Refer to numpy.prod for full documentation.
Notes
Arithmetic is modular when using integer types, and no error is raised on overflow.
See also
ndarray.prod()
corresponding function for ndarrays
numpy.prod()
equivalent function
-
ptp
(axis=None, out=None, fill_value=None, keepdims=False)¶ Return (maximum - minimum) along the given dimension (i.e. peak-to-peak value).
- Parameters
axis ({None, int}, optional) – Axis along which to find the peaks. If None (default) the flattened array is used.
out ({None, array_like}, optional) – Alternative output array in which to place the result. It must have the same shape and buffer length as the expected output but the type will be cast if necessary.
fill_value ({var}, optional) – Value used to fill in the masked values.
- Returns
ptp – A new array holding the result, unless
out
was specified, in which case a reference toout
is returned.- Return type
ndarray.
-
put
(indices, values, mode='raise')¶ Set storage-indexed locations to corresponding values.
Sets self._data.flat[n] = values[n] for each n in indices. If values is shorter than indices then it will repeat. If values has some masked values, the initial mask is updated in consequence, else the corresponding values are unmasked.
- Parameters
indices (1-D array_like) – Target indices, interpreted as integers.
values (array_like) – Values to place in self._data copy at target indices.
mode ({'raise', 'wrap', 'clip'}, optional) – Specifies how out-of-bounds indices will behave. ‘raise’ : raise an error. ‘wrap’ : wrap around. ‘clip’ : clip to the range.
Notes
values can be a scalar or length 1 array.
Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.put([0,4,8],[10,20,30]) >>> x masked_array( data=[[10, --, 3], [--, 20, --], [7, --, 30]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999)
>>> x.put(4,999) >>> x masked_array( data=[[10, --, 3], [--, 999, --], [7, --, 30]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999)
-
ravel
(order='C')¶ Returns a 1D version of self, as a view.
- Parameters
order ({'C', 'F', 'A', 'K'}, optional) – The elements of a are read using this index order. ‘C’ means to index the elements in C-like order, with the last axis index changing fastest, back to the first axis index changing slowest. ‘F’ means to index the elements in Fortran-like index order, with the first index changing fastest, and the last index changing slowest. Note that the ‘C’ and ‘F’ options take no account of the memory layout of the underlying array, and only refer to the order of axis indexing. ‘A’ means to read the elements in Fortran-like index order if m is Fortran contiguous in memory, C-like order otherwise. ‘K’ means to read the elements in the order they occur in memory, except for reversing the data when strides are negative. By default, ‘C’ index order is used.
- Returns
Output view is of shape
(self.size,)
(or(np.ma.product(self.shape),)
).- Return type
MaskedArray
Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.ravel() masked_array(data=[1, --, 3, --, 5, --, 7, --, 9], mask=[False, True, False, True, False, True, False, True, False], fill_value=999999)
-
property
real
¶ The real part of the masked array.
This property is a view on the real part of this MaskedArray.
See also
Examples
>>> x = np.ma.array([1+1.j, -2j, 3.45+1.6j], mask=[False, True, False]) >>> x.real masked_array(data=[1.0, --, 3.45], mask=[False, True, False], fill_value=1e+20)
-
property
recordmask
¶ Get or set the mask of the array if it has no named fields. For structured arrays, returns a ndarray of booleans where entries are
True
if all the fields are masked,False
otherwise:>>> x = np.ma.array([(1, 1), (2, 2), (3, 3), (4, 4), (5, 5)], ... mask=[(0, 0), (1, 0), (1, 1), (0, 1), (0, 0)], ... dtype=[('a', int), ('b', int)]) >>> x.recordmask array([False, False, True, False, False])
-
repeat
(repeats, axis=None)¶ Repeat elements of an array.
Refer to numpy.repeat for full documentation.
See also
numpy.repeat()
equivalent function
-
reshape
(*s, **kwargs)¶ Give a new shape to the array without changing its data.
Returns a masked array containing the same data, but with a new shape. The result is a view on the original array; if this is not possible, a ValueError is raised.
- Parameters
shape (int or tuple of ints) – The new shape should be compatible with the original shape. If an integer is supplied, then the result will be a 1-D array of that length.
order ({'C', 'F'}, optional) – Determines whether the array data should be viewed as in C (row-major) or FORTRAN (column-major) order.
- Returns
reshaped_array – A new view on the array.
- Return type
array
See also
reshape()
Equivalent function in the masked array module.
numpy.ndarray.reshape()
Equivalent method on ndarray object.
numpy.reshape()
Equivalent function in the NumPy module.
Notes
The reshaping operation cannot guarantee that a copy will not be made, to modify the shape in place, use
a.shape = s
Examples
>>> x = np.ma.array([[1,2],[3,4]], mask=[1,0,0,1]) >>> x masked_array( data=[[--, 2], [3, --]], mask=[[ True, False], [False, True]], fill_value=999999) >>> x = x.reshape((4,1)) >>> x masked_array( data=[[--], [2], [3], [--]], mask=[[ True], [False], [False], [ True]], fill_value=999999)
-
resize
(newshape, refcheck=True, order=False)¶ Warning
This method does nothing, except raise a ValueError exception. A masked array does not own its data and therefore cannot safely be resized in place. Use the numpy.ma.resize function instead.
This method is difficult to implement safely and may be deprecated in future releases of NumPy.
-
round
(decimals=0, out=None)¶ Return each element rounded to the given number of decimals.
Refer to numpy.around for full documentation.
See also
ndarray.around()
corresponding function for ndarrays
numpy.around()
equivalent function
-
searchsorted
(v, side='left', sorter=None)¶ Find indices where elements of v should be inserted in a to maintain order.
For full documentation, see numpy.searchsorted
See also
numpy.searchsorted()
equivalent function
-
setfield
(val, dtype, offset=0)¶ Put a value into a specified place in a field defined by a data-type.
Place val into a’s field defined by dtype and beginning offset bytes into the field.
- Parameters
val (object) – Value to be placed in field.
dtype (dtype object) – Data-type of the field in which to place val.
offset (int, optional) – The number of bytes into the field at which to place val.
- Returns
- Return type
None
See also
Examples
>>> x = np.eye(3) >>> x.getfield(np.float64) array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]]) >>> x.setfield(3, np.int32) >>> x.getfield(np.int32) array([[3, 3, 3], [3, 3, 3], [3, 3, 3]], dtype=int32) >>> x array([[1.0e+000, 1.5e-323, 1.5e-323], [1.5e-323, 1.0e+000, 1.5e-323], [1.5e-323, 1.5e-323, 1.0e+000]]) >>> x.setfield(np.eye(3), np.int32) >>> x array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
-
setflags
(write=None, align=None, uic=None)¶ Set array flags WRITEABLE, ALIGNED, (WRITEBACKIFCOPY and UPDATEIFCOPY), respectively.
These Boolean-valued flags affect how numpy interprets the memory area used by a (see Notes below). The ALIGNED flag can only be set to True if the data is actually aligned according to the type. The WRITEBACKIFCOPY and (deprecated) UPDATEIFCOPY flags can never be set to True. The flag WRITEABLE can only be set to True if the array owns its own memory, or the ultimate owner of the memory exposes a writeable buffer interface, or is a string. (The exception for string is made so that unpickling can be done without copying memory.)
- Parameters
write (bool, optional) – Describes whether or not a can be written to.
align (bool, optional) – Describes whether or not a is aligned properly for its type.
uic (bool, optional) – Describes whether or not a is a copy of another “base” array.
Notes
Array flags provide information about how the memory area used for the array is to be interpreted. There are 7 Boolean flags in use, only four of which can be changed by the user: WRITEBACKIFCOPY, UPDATEIFCOPY, WRITEABLE, and ALIGNED.
WRITEABLE (W) the data area can be written to;
ALIGNED (A) the data and strides are aligned appropriately for the hardware (as determined by the compiler);
UPDATEIFCOPY (U) (deprecated), replaced by WRITEBACKIFCOPY;
WRITEBACKIFCOPY (X) this array is a copy of some other array (referenced by .base). When the C-API function PyArray_ResolveWritebackIfCopy is called, the base array will be updated with the contents of this array.
All flags can be accessed using the single (upper case) letter as well as the full name.
Examples
>>> y = np.array([[3, 1, 7], ... [2, 0, 0], ... [8, 5, 9]]) >>> y array([[3, 1, 7], [2, 0, 0], [8, 5, 9]]) >>> y.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : True ALIGNED : True WRITEBACKIFCOPY : False UPDATEIFCOPY : False >>> y.setflags(write=0, align=0) >>> y.flags C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : True WRITEABLE : False ALIGNED : False WRITEBACKIFCOPY : False UPDATEIFCOPY : False >>> y.setflags(uic=1) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot set WRITEBACKIFCOPY flag to True
-
property
shape
¶ Tuple of array dimensions.
The shape property is usually used to get the current shape of an array, but may also be used to reshape the array in-place by assigning a tuple of array dimensions to it. As with numpy.reshape, one of the new shape dimensions can be -1, in which case its value is inferred from the size of the array and the remaining dimensions. Reshaping an array in-place will fail if a copy is required.
Examples
>>> x = np.array([1, 2, 3, 4]) >>> x.shape (4,) >>> y = np.zeros((2, 3, 4)) >>> y.shape (2, 3, 4) >>> y.shape = (3, 8) >>> y array([[ 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0., 0.]]) >>> y.shape = (3, 6) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: total size of new array must be unchanged >>> np.zeros((4,2))[::2].shape = (-1,) Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: incompatible shape for a non-contiguous array
See also
numpy.reshape
similar function
ndarray.reshape
similar method
Share status of the mask (read-only).
-
shrink_mask
()¶ Reduce a mask to nomask when possible.
- Parameters
None –
- Returns
- Return type
None
Examples
>>> x = np.ma.array([[1,2 ], [3, 4]], mask=[0]*4) >>> x.mask array([[False, False], [False, False]]) >>> x.shrink_mask() masked_array( data=[[1, 2], [3, 4]], mask=False, fill_value=999999) >>> x.mask False
-
size
¶ Number of elements in the array.
Equal to
np.prod(a.shape)
, i.e., the product of the array’s dimensions.Notes
a.size returns a standard arbitrary precision Python integer. This may not be the case with other methods of obtaining the same value (like the suggested
np.prod(a.shape)
, which returns an instance ofnp.int_
), and may be relevant if the value is used further in calculations that may overflow a fixed size integer type.Examples
>>> x = np.zeros((3, 5, 2), dtype=np.complex128) >>> x.size 30 >>> np.prod(x.shape) 30
-
soften_mask
()¶ Force the mask to soft.
Whether the mask of a masked array is hard or soft is determined by its hardmask property. soften_mask sets hardmask to False.
See also
-
sort
(axis=-1, kind=None, order=None, endwith=True, fill_value=None)¶ Sort the array, in-place
- Parameters
a (array_like) – Array to be sorted.
axis (int, optional) – Axis along which to sort. If None, the array is flattened before sorting. The default is -1, which sorts along the last axis.
kind ({'quicksort', 'mergesort', 'heapsort', 'stable'}, optional) – The sorting algorithm used.
order (list, optional) – When a is a structured array, this argument specifies which fields to compare first, second, and so on. This list does not need to include all of the fields.
endwith ({True, False}, optional) – Whether missing values (if any) should be treated as the largest values (True) or the smallest values (False) When the array contains unmasked values sorting at the same extremes of the datatype, the ordering of these values and the masked values is undefined.
fill_value ({var}, optional) – Value used internally for the masked values. If
fill_value
is not None, it supersedesendwith
.
- Returns
sorted_array – Array of the same type and shape as a.
- Return type
ndarray
See also
ndarray.sort()
Method to sort an array in-place.
argsort()
Indirect sort.
lexsort()
Indirect stable sort on multiple keys.
searchsorted()
Find elements in a sorted array.
Notes
See
sort
for notes on the different sorting algorithms.Examples
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # Default >>> a.sort() >>> a masked_array(data=[1, 3, 5, --, --], mask=[False, False, False, True, True], fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # Put missing values in the front >>> a.sort(endwith=False) >>> a masked_array(data=[--, --, 1, 3, 5], mask=[ True, True, False, False, False], fill_value=999999)
>>> a = np.ma.array([1, 2, 5, 4, 3],mask=[0, 1, 0, 1, 0]) >>> # fill_value takes over endwith >>> a.sort(endwith=False, fill_value=3) >>> a masked_array(data=[1, --, --, 3, 5], mask=[False, True, True, False, False], fill_value=999999)
-
squeeze
(axis=None)¶ Remove single-dimensional entries from the shape of a.
Refer to numpy.squeeze for full documentation.
See also
numpy.squeeze()
equivalent function
-
std
(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)¶ Returns the standard deviation of the array elements along given axis.
Masked entries are ignored.
Refer to numpy.std for full documentation.
See also
ndarray.std()
corresponding function for ndarrays
numpy.std()
Equivalent function
-
strides
¶ Tuple of bytes to step in each dimension when traversing an array.
The byte offset of element
(i[0], i[1], ..., i[n])
in an array a is:offset = sum(np.array(i) * a.strides)
A more detailed explanation of strides can be found in the “ndarray.rst” file in the NumPy reference guide.
Notes
Imagine an array of 32-bit integers (each 4 bytes):
x = np.array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]], dtype=np.int32)
This array is stored in memory as 40 bytes, one after the other (known as a contiguous block of memory). The strides of an array tell us how many bytes we have to skip in memory to move to the next position along a certain axis. For example, we have to skip 4 bytes (1 value) to move to the next column, but 20 bytes (5 values) to get to the same position in the next row. As such, the strides for the array x will be
(20, 4)
.See also
numpy.lib.stride_tricks.as_strided
Examples
>>> y = np.reshape(np.arange(2*3*4), (2,3,4)) >>> y array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]], [[12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]]) >>> y.strides (48, 16, 4) >>> y[1,1,1] 17 >>> offset=sum(y.strides * np.array((1,1,1))) >>> offset/y.itemsize 17
>>> x = np.reshape(np.arange(5*6*7*8), (5,6,7,8)).transpose(2,3,1,0) >>> x.strides (32, 4, 224, 1344) >>> i = np.array([3,5,2,2]) >>> offset = sum(i * x.strides) >>> x[3,5,2,2] 813 >>> offset / x.itemsize 813
-
sum
(axis=None, dtype=None, out=None, keepdims=<no value>)¶ Return the sum of the array elements over the given axis.
Masked elements are set to 0 internally.
Refer to numpy.sum for full documentation.
See also
ndarray.sum()
corresponding function for ndarrays
numpy.sum()
equivalent function
Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.sum() 25 >>> x.sum(axis=1) masked_array(data=[4, 5, 16], mask=[False, False, False], fill_value=999999) >>> x.sum(axis=0) masked_array(data=[8, 5, 12], mask=[False, False, False], fill_value=999999) >>> print(type(x.sum(axis=0, dtype=np.int64)[0])) <class 'numpy.int64'>
-
swapaxes
(axis1, axis2)¶ Return a view of the array with axis1 and axis2 interchanged.
Refer to numpy.swapaxes for full documentation.
See also
numpy.swapaxes()
equivalent function
-
take
(indices, axis=None, out=None, mode='raise')¶
-
tobytes
(fill_value=None, order='C')¶ Return the array data as a string containing the raw bytes in the array.
The array is filled with a fill value before the string conversion.
New in version 1.9.0.
- Parameters
fill_value (scalar, optional) – Value used to fill in the masked values. Default is None, in which case MaskedArray.fill_value is used.
order ({'C','F','A'}, optional) –
Order of the data item in the copy. Default is ‘C’.
’C’ – C order (row major).
’F’ – Fortran order (column major).
’A’ – Any, current order of array.
None – Same as ‘A’.
Notes
As for ndarray.tobytes, information about the shape, dtype, etc., but also about fill_value, will be lost.
Examples
>>> x = np.ma.array(np.array([[1, 2], [3, 4]]), mask=[[0, 1], [1, 0]]) >>> x.tobytes() b'\x01\x00\x00\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00?B\x0f\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00\x00\x00'
-
tofile
(fid, sep='', format='%s')¶ Save a masked array to a file in binary format.
Warning
This function is not implemented yet.
- Raises
NotImplementedError – When tofile is called.
-
toflex
()¶ Transforms a masked array into a flexible-type array.
The flexible type array that is returned will have two fields:
the
_data
field stores the_data
part of the array.the
_mask
field stores the_mask
part of the array.
- Parameters
None –
- Returns
record – A new flexible-type ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape.
- Return type
ndarray
Notes
A side-effect of transforming a masked array into a flexible ndarray is that meta information (
fill_value
, …) will be lost.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.toflex() array([[(1, False), (2, True), (3, False)], [(4, True), (5, False), (6, True)], [(7, False), (8, True), (9, False)]], dtype=[('_data', '<i8'), ('_mask', '?')])
-
tolist
(fill_value=None)¶ Return the data portion of the masked array as a hierarchical Python list.
Data items are converted to the nearest compatible Python type. Masked values are converted to fill_value. If fill_value is None, the corresponding entries in the output list will be
None
.- Parameters
fill_value (scalar, optional) – The value to use for invalid entries. Default is None.
- Returns
result – The Python list representation of the masked array.
- Return type
list
Examples
>>> x = np.ma.array([[1,2,3], [4,5,6], [7,8,9]], mask=[0] + [1,0]*4) >>> x.tolist() [[1, None, 3], [None, 5, None], [7, None, 9]] >>> x.tolist(-999) [[1, -999, 3], [-999, 5, -999], [7, -999, 9]]
-
torecords
()¶ Transforms a masked array into a flexible-type array.
The flexible type array that is returned will have two fields:
the
_data
field stores the_data
part of the array.the
_mask
field stores the_mask
part of the array.
- Parameters
None –
- Returns
record – A new flexible-type ndarray with two fields: the first element containing a value, the second element containing the corresponding mask boolean. The returned record shape matches self.shape.
- Return type
ndarray
Notes
A side-effect of transforming a masked array into a flexible ndarray is that meta information (
fill_value
, …) will be lost.Examples
>>> x = np.ma.array([[1,2,3],[4,5,6],[7,8,9]], mask=[0] + [1,0]*4) >>> x masked_array( data=[[1, --, 3], [--, 5, --], [7, --, 9]], mask=[[False, True, False], [ True, False, True], [False, True, False]], fill_value=999999) >>> x.toflex() array([[(1, False), (2, True), (3, False)], [(4, True), (5, False), (6, True)], [(7, False), (8, True), (9, False)]], dtype=[('_data', '<i8'), ('_mask', '?')])
-
tostring
(fill_value=None, order='C')¶ This function is a compatibility alias for tobytes. Despite its name it returns bytes not strings.
-
trace
(offset=0, axis1=0, axis2=1, dtype=None, out=None)¶ Return the sum along diagonals of the array.
Refer to numpy.trace for full documentation.
See also
numpy.trace()
equivalent function
-
transpose
(*axes)¶ Returns a view of the array with axes transposed.
For a 1-D array this has no effect, as a transposed vector is simply the same vector. To convert a 1-D array into a 2D column vector, an additional dimension must be added. np.atleast2d(a).T achieves this, as does a[:, np.newaxis]. For a 2-D array, this is a standard matrix transpose. For an n-D array, if axes are given, their order indicates how the axes are permuted (see Examples). If axes are not provided and
a.shape = (i[0], i[1], ... i[n-2], i[n-1])
, thena.transpose().shape = (i[n-1], i[n-2], ... i[1], i[0])
.- Parameters
axes (None, tuple of ints, or n ints) –
None or no argument: reverses the order of the axes.
tuple of ints: i in the j-th place in the tuple means a’s i-th axis becomes a.transpose()’s j-th axis.
n ints: same as an n-tuple of the same ints (this form is intended simply as a “convenience” alternative to the tuple form)
- Returns
out – View of a, with axes suitably permuted.
- Return type
ndarray
See also
ndarray.T()
Array property returning the array transposed.
ndarray.reshape()
Give a new shape to an array without changing its data.
Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> a array([[1, 2], [3, 4]]) >>> a.transpose() array([[1, 3], [2, 4]]) >>> a.transpose((1, 0)) array([[1, 3], [2, 4]]) >>> a.transpose(1, 0) array([[1, 3], [2, 4]])
Copy the mask and set the sharedmask flag to False.
Whether the mask is shared between masked arrays can be seen from the sharedmask property. unshare_mask ensures the mask is not shared. A copy of the mask is only made if it was shared.
See also
-
var
(axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>)¶ Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
- Parameters
a (array_like) – Array containing numbers whose variance is desired. If a is not an array, a conversion is attempted.
axis (None or int or tuple of ints, optional) –
Axis or axes along which the variance is computed. The default is to compute the variance of the flattened array.
New in version 1.7.0.
If this is a tuple of ints, a variance is performed over multiple axes, instead of a single axis or all the axes as before.
dtype (data-type, optional) – Type to use in computing the variance. For arrays of integer type the default is float32; for arrays of float types it is the same as the array type.
out (ndarray, optional) – Alternate output array in which to place the result. It must have the same shape as the expected output, but the type is cast if necessary.
ddof (int, optional) – “Delta Degrees of Freedom”: the divisor used in the calculation is
N - ddof
, whereN
represents the number of elements. By default ddof is zero.keepdims (bool, optional) –
If this is set to True, the axes which are reduced are left in the result as dimensions with size one. With this option, the result will broadcast correctly against the input array.
If the default value is passed, then keepdims will not be passed through to the var method of sub-classes of ndarray, however any non-default value will be. If the sub-class’ method does not implement keepdims any exceptions will be raised.
- Returns
variance – If
out=None
, returns a new array containing the variance; otherwise, a reference to the output array is returned.- Return type
ndarray, see dtype parameter above
Notes
The variance is the average of the squared deviations from the mean, i.e.,
var = mean(abs(x - x.mean())**2)
.The mean is normally calculated as
x.sum() / N
, whereN = len(x)
. If, however, ddof is specified, the divisorN - ddof
is used instead. In standard statistical practice,ddof=1
provides an unbiased estimator of the variance of a hypothetical infinite population.ddof=0
provides a maximum likelihood estimate of the variance for normally distributed variables.Note that for complex numbers, the absolute value is taken before squaring, so that the result is always real and nonnegative.
For floating-point input, the variance is computed using the same precision the input has. Depending on the input data, this can cause the results to be inaccurate, especially for float32 (see example below). Specifying a higher-accuracy accumulator using the
dtype
keyword can alleviate this issue.Examples
>>> a = np.array([[1, 2], [3, 4]]) >>> np.var(a) 1.25 >>> np.var(a, axis=0) array([1., 1.]) >>> np.var(a, axis=1) array([0.25, 0.25])
In single precision, var() can be inaccurate:
>>> a = np.zeros((2, 512*512), dtype=np.float32) >>> a[0, :] = 1.0 >>> a[1, :] = 0.1 >>> np.var(a) 0.20250003
Computing the variance in float64 is more accurate:
>>> np.var(a, dtype=np.float64) 0.20249999932944759 # may vary >>> ((1-0.55)**2 + (0.1-0.55)**2)/2 0.2025
-
view
(dtype=None, type=None)¶ New view of array with the same data.
- Parameters
dtype (data-type or ndarray sub-class, optional) – Data-type descriptor of the returned view, e.g., float32 or int16. The default, None, results in the view having the same data-type as a. This argument can also be specified as an ndarray sub-class, which then specifies the type of the returned object (this is equivalent to setting the
type
parameter).type (Python type, optional) – Type of the returned view, e.g., ndarray or matrix. Again, the default None results in type preservation.
Notes
a.view()
is used two different ways:a.view(some_dtype)
ora.view(dtype=some_dtype)
constructs a view of the array’s memory with a different data-type. This can cause a reinterpretation of the bytes of memory.a.view(ndarray_subclass)
ora.view(type=ndarray_subclass)
just returns an instance of ndarray_subclass that looks at the same array (same shape, dtype, etc.) This does not cause a reinterpretation of the memory.For
a.view(some_dtype)
, ifsome_dtype
has a different number of bytes per entry than the previous dtype (for example, converting a regular array to a structured array), then the behavior of the view cannot be predicted just from the superficial appearance ofa
(shown byprint(a)
). It also depends on exactly howa
is stored in memory. Therefore ifa
is C-ordered versus fortran-ordered, versus defined as a slice or transpose, etc., the view may give different results.Examples
>>> x = np.array([(1, 2)], dtype=[('a', np.int8), ('b', np.int8)])
Viewing array data using a different type and dtype:
>>> y = x.view(dtype=np.int16, type=np.matrix) >>> y matrix([[513]], dtype=int16) >>> print(type(y)) <class 'numpy.matrix'>
Creating a view on a structured array so it can be used in calculations
>>> x = np.array([(1, 2),(3,4)], dtype=[('a', np.int8), ('b', np.int8)]) >>> xv = x.view(dtype=np.int8).reshape(-1,2) >>> xv array([[1, 2], [3, 4]], dtype=int8) >>> xv.mean(0) array([2., 3.])
Making changes to the view changes the underlying array
>>> xv[0,1] = 20 >>> x array([(1, 20), (3, 4)], dtype=[('a', 'i1'), ('b', 'i1')])
Using a view to convert an array to a recarray:
>>> z = x.view(np.recarray) >>> z.a array([1, 3], dtype=int8)
Views share data:
>>> x[0] = (9, 10) >>> z[0] (9, 10)
Views that change the dtype size (bytes per entry) should normally be avoided on arrays defined by slices, transposes, fortran-ordering, etc.:
>>> x = np.array([[1,2,3],[4,5,6]], dtype=np.int16) >>> y = x[:, 0:2] >>> y array([[1, 2], [4, 5]], dtype=int16) >>> y.view(dtype=[('width', np.int16), ('length', np.int16)]) Traceback (most recent call last): ... ValueError: To change to a dtype of a different size, the array must be C-contiguous >>> z = y.copy() >>> z.view(dtype=[('width', np.int16), ('length', np.int16)]) array([[(1, 2)], [(4, 5)]], dtype=[('width', '<i2'), ('length', '<i2')])
-
-
class
fanc.matrix.
RegionMatrixContainer
¶ Bases:
fanc.matrix.RegionPairsContainer
,fanc.regions.RegionBasedWithBins
Class representing matrices where pixels correspond to genomic region pairs.
This is the common interface for all matrix-based classes, such as
Hic
orFoldChangeMatrix
. It provides access to specialised matrix methods, most importantlymatrix()
, which assemblesnumpy
arrays from the list of pairwise contacts stored in each object.It inherits all region methods from
RegionBased
, and all edge/contact methods fromRegionPairsContainer
. You can use the same type of keys formatrix()
that you would use foredges()
, and additionally have the option to retrieve the observed/expected matrix.import fanc hic = fanc.load("output/hic/binned/fanc_example_1mb.hic") # get the whole-genome matrix m = hic.matrix() type(m) # fanc.matrix.RegionMatrix isinstance(m, np.ndarray) # True m.shape # 139, 139 # get just the chromosome 18 intra-chromosomal matrix m = hic.matrix(('chr18', 'chr18')) m.shape # 79, 79 # get all rows of the whole-genome matrix # corresponding to chromosome 18 m = hic.matrix('chr18') m.shape # 79, 139 # get unnormalised chromosome 18 matrix m = hic.matrix(('chr18', 'chr18'), norm=False) # get chromosome 18 O/E matrix m = hic.matrix(('chr18', 'chr18'), oe=True) # get log2-transformed chromosome 18 O/E matrix m = hic.matrix(('chr18', 'chr18'), oe=True, log=True)
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ Get a list of chromosome names.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.- Parameters
masked – Use a numpy masked array to mask entries corresponding to unmappable regions
kwargs – Keyword arguments passed to
edges()
-
matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.- Parameters
key – Matrix selector. See
edges()
for all supported key typeslog – If True, log-transform the matrix entries. Also see log_base
log_base – Base of the log transformation. Default: 2; only used when log=True
default_value – (optional) set the default value of matrix entries that have no associated edge/contact
mask – If False, do not mask unmappable regions
args – Positional arguments passed to
regions_and_matrix_entries()
kwargs – Keyword arguments passed to
regions_and_matrix_entries()
- Returns
-
classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsContainer
objects.- Parameters
pairs –
list
ofRegionPairsContainer
args – Positional arguments passed to constructor of this class
kwargs – Keyword arguments passed to constructor of this class
-
possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
- Returns
possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
-
regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
- Parameters
key – Edge key, see
edges()
oe – If True, will divide observed values by their expected value at the given distance. False by default
oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the
_default_score_field
attribute of the matrix class.args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over (i, j, weight) tuples
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
- Parameters
matrix – A
Hic
objectweight_column – Name of the column to calculate the scaling factor on
- Returns
float
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
-
class
fanc.matrix.
RegionMatrixTable
(file_name=None, mode='a', tmpdir=None, partition_strategy='auto', additional_region_fields=None, additional_edge_fields=None, default_score_field='weight', default_value=0.0, _table_name_regions='regions', _table_name_edges='edges', _table_name_expected_values='expected_values', _edge_buffer_size='3G')¶ Bases:
fanc.matrix.RegionMatrixContainer
,fanc.matrix.RegionPairsTable
HDF5 implementation of the
RegionMatrixContainer
interface.-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
MaskDescription
¶ Bases:
tables.description.IsDescription
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
- Parameters
name (str) – name of the mask
description (str) – description of the mask
- Returns
id of the mask
- Return type
int
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
- Parameters
n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
file_name – Output file name for down-sampled object.
- Returns
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
expected_values
(selected_chromosome=None, norm=True, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
expected_values_and_marginals
(selected_chromosome=None, norm=True, force=False, *args, **kwargs)¶ Calculate the expected values for genomic contacts at all distances and the whole matrix marginals.
This calculates the expected values between genomic regions separated by a specific distance. Expected values are calculated as the average weight of edges between region pairs with the same genomic separation, taking into account unmappable regions.
It will return a tuple with three values: a list of genome-wide intra-chromosomal expected values (list index corresponds to number of separating bins), a dict with chromosome names as keys and intra-chromosomal expected values specific to each chromosome, and a float for inter-chromosomal expected value.
- Parameters
selected_chromosome – (optional) Chromosome name. If provided, will only return expected values for this chromosome.
norm – If False, will calculate the expected values on the unnormalised matrix.
args – Not used in this context
kwargs – Not used in this context
- Returns
list of intra-chromosomal expected values, dict of intra-chromosomal expected values by chromosome, inter-chromosomal expected value
-
filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.- Parameters
edge_filter – Class implementing
MaskFilter
.queue – If True, filter will be queued and can be executed along with other queued filters using
run_queued_filters()
log_progress – If true, process iterating through all edges will be continuously reported.
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
- Parameters
silent – do not print flush progress
update_mappability – After writing data, update mappability and expected values
-
get_mask
(key)¶ Search _mask table for key and return Mask.
- Parameters
key (int) – search by mask name
key – search by mask ID
- Returns
Mask
-
get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
- Parameters
ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
- Returns
list of Masks extracted from ix
- Return type
list (Mask)
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
marginals
(masked=True, *args, **kwargs)¶ Get the marginals vector of this Hic matrix.
Sums up all contacts for each bin of the Hi-C matrix. Unmappable regoins will be masked in the returned vector unless the
masked
parameter is set toFalse
.By default, corrected matrix entries are summed up. To get uncorrected matrix marginals use
norm=False
. Generally, all parameters accepted byedges()
are supported.- Parameters
masked – Use a numpy masked array to mask entries corresponding to unmappable regions
kwargs – Keyword arguments passed to
edges()
-
matrix
(key=None, log=False, default_value=None, mask=True, log_base=2, *args, **kwargs)¶ Assemble a
RegionMatrix
from region pairs.- Parameters
key – Matrix selector. See
edges()
for all supported key typeslog – If True, log-transform the matrix entries. Also see log_base
log_base – Base of the log transformation. Default: 2; only used when log=True
default_value – (optional) set the default value of matrix entries that have no associated edge/contact
mask – If False, do not mask unmappable regions
args – Positional arguments passed to
regions_and_matrix_entries()
kwargs – Keyword arguments passed to
regions_and_matrix_entries()
- Returns
-
classmethod
merge
(matrices, *args, **kwargs)¶ Merge multiple
RegionMatrixContainer
objects.Merging is done by adding the weight of edges in each object.
- Parameters
matrices – list of
RegionMatrixContainer
- Returns
merged
RegionMatrixContainer
-
possible_contacts
()¶ Calculate the possible number of contacts in the genome.
This calculates the number of potential region pairs in a genome for any possible separation distance, taking into account the existence of unmappable regions.
It will calculate one number for inter-chromosomal pairs, return a list with the number of possible pairs where the list index corresponds to the number of bins separating two regions, and a dictionary of lists for each chromosome.
- Returns
possible intra-chromosomal pairs, possible intra-chromosomal pairs by chromosome, possible inter-chromosomal pairs
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
-
regions_and_matrix_entries
(key=None, score_field=None, *args, **kwargs)¶ Convenient access to non-zero matrix entries and associated regions.
- Parameters
key – Edge key, see
edges()
oe – If True, will divide observed values by their expected value at the given distance. False by default
oe_per_chromosome – If True (default), will do a per-chromosome O/E calculation rather than using the whole matrix to obtain expected values
score_field – (optional) any edge attribute that returns a number can be specified here for filling the matrix. Usually this is defined by the
_default_score_field
attribute of the matrix class.args – Positional arguments passed to
edges()
kwargs – Keyword arguments passed to
edges()
- Returns
list of row regions, list of col regions, iterator over (i, j, weight) tuples
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters
(log_progress=True)¶ Run queued filters.
- Parameters
log_progress – If true, process iterating through all edges will be continuously reported.
-
scaling_factor
(matrix, weight_column=None)¶ Compute the scaling factor to another matrix.
Calculates the ratio between the number of contacts in this Hic object to the number of contacts in another Hic object.
- Parameters
matrix – A
Hic
objectweight_column – Name of the column to calculate the scaling factor on
- Returns
float
-
subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
- Parameters
regions – string or GenomicRegion object(s)
kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to
edges()
- Returns
Hic
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
-
class
fanc.matrix.
RegionPairsContainer
¶ Bases:
genomic_regions.regions.RegionBased
Class representing pairs of genomic regions.
This is the basic interface for all pair and matrix classes in this module. It inherits all methods from
RegionBased
, and is therefore based on a list of genomic regions (GenomicRegion
) representing the underlying genome. You can use theregions()
method to access genomic regions in a intuitive fashion, for example:for region in rpc.regions('chr1'): # do something with region print(region)
For more details on region access, see the
genomic_regions
documentation, on which this module is built.RegionPairsContainer
adds methods for pairs of genomic regions on top of theRegionBased
methods for individual regions. In the nomenclature of this module, which borrows from network analysis terminology, a pair of regions is represented by anEdge
.# iterate over all region pairs / edges in chr1 for edge in rpc.edges(("chr1", "chr1")): # do something with edge / region pair region1 = edge.source_region region2 = edge.sink_region
for more details see the
edges()
method help.This class itself is only an interface and cannot actually be used to add regions and region pairs. Implementations of this interface, i.e. subclasses such as
RegionPairsTable
must override various hidden methods to give them full functionality._add_edge()
is used to save region pairs / edges to the object. It receives a singleEdge
as input and should return the index of the added edge._edges_iter()
is required byedges()
. It is used to iterate over all edges in the object in no particular order. It should return a generator ofEdge
objects representing all region pairs in the object._edges_subset()
is also used byedges()
. It is used to iterate over a subset of edges in this object. It receives as input akey
representing the requested subset (further described inedges()
), and two lists ofGenomicRegion
objects,row_regions
andcol_regions
representing the two dimensions of regions selected bykey
. It should return an iterator overEdge
objects._edges_getitem()
is used byedges()
for retrieval of edges by bracket notation. For integer input, it should return a singleEdge
, forslice
input a list ofEdge
objects.
The above methods cover all the basic
RegionPairsContainer
functionality, but for speed improvements you may also want to override the following method, which by default iterates over all edges_edges_length()
which returns the total number of edges in the object
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ Get a list of chromosome names.
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsContainer
objects.- Parameters
pairs –
list
ofRegionPairsContainer
args – Positional arguments passed to constructor of this class
kwargs – Keyword arguments passed to constructor of this class
-
region_bins
(region)¶ Takes a genomic region and returns a slice of the bin indices that are covered by the region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
- Returns
slice
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class
fanc.matrix.
RegionPairsTable
(file_name=None, mode='a', tmpdir=None, additional_region_fields=None, additional_edge_fields=None, partition_strategy='auto', _table_name_regions='regions', _table_name_edges='edges', _edge_buffer_size='3G', _edge_table_prefix='chrpair_')¶ Bases:
fanc.matrix.RegionPairsContainer
,fanc.general.Maskable
,fanc.regions.RegionsTable
HDF5 implementation of the
RegionPairsContainer
interface.-
class
ChromosomeDescription
¶ Bases:
tables.description.IsDescription
Description of the chromosomes in this object.
-
class
MaskDescription
¶ Bases:
tables.description.IsDescription
-
class
RegionDescription
¶ Bases:
tables.description.IsDescription
Description of a genomic region for PyTables Table
-
add_contact
(contact, *args, **kwargs)¶ Alias for
add_edge()
- Parameters
contact –
Edge
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_contacts
(contacts, *args, **kwargs)¶ Alias for
add_edges()
-
add_edge
(edge, check_nodes_exist=True, *args, **kwargs)¶ Add an edge / contact between two regions to this object.
- Parameters
edge –
Edge
, dict with at least the attributes source and sink, optionally weight, or a list of length 2 (source, sink) or 3 (source, sink, weight).check_nodes_exist – Make sure that there are nodes that match source and sink indexes
args – Positional arguments passed to
_add_edge()
kwargs – Keyword arguments passed to
_add_edge()
-
add_edge_from_dict
(edge, *args, **kwargs)¶ Direct method to add an edge from dict input.
- Parameters
edge – dict with at least the keys “source” and “sink”. Additional keys will be loaded as edge attributes
-
add_edge_from_edge
(edge, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
edge –
Edge
-
add_edge_from_list
(edge, *args, **kwargs)¶ Direct method to add an edge from list or tuple input.
- Parameters
edge – List or tuple. Should be of length 2 (source, sink) or 3 (source, sink, weight)
-
add_edge_simple
(source, sink, weight=None, *args, **kwargs)¶ Direct method to add an edge from
Edge
input.- Parameters
source – Source region index
sink – Sink region index
weight – Weight of the edge
-
add_edges
(edges, flush=True, *args, **kwargs)¶ Bulk-add edges from a list.
List items can be any of the supported edge types, list, tuple, dict, or
Edge
. Repeatedly callsadd_edge()
, so may be inefficient for large amounts of data.- Parameters
edges – List (or iterator) of edges. See
add_edge()
for details
-
add_mask_description
(name, description)¶ Add a mask description to the _mask table and return its ID.
- Parameters
name (str) – name of the mask
description (str) – description of the mask
- Returns
id of the mask
- Return type
int
-
add_region
(region, *args, **kwargs)¶ Add a genomic region to this object.
This method offers some flexibility in the types of objects that can be loaded. See parameters for details.
- Parameters
region – Can be a
GenomicRegion
, a str in the form ‘<chromosome>:<start>-<end>[:<strand>], a dict with at least the fields ‘chromosome’, ‘start’, and ‘end’, optionally ‘ix’, or a list of length 3 (chromosome, start, end) or 4 (ix, chromosome, start, end).
-
add_regions
(regions, *args, **kwargs)¶ Bulk insert multiple genomic regions.
- Parameters
regions – List (or any iterator) with objects that describe a genomic region. See
add_region
for options.
-
static
bin_intervals
(intervals, bins, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into a fixed number of bins.
- Parameters
intervals – iterator of tuples (start, end, score)
bins – Number of bins to divide the region into
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
static
bin_intervals_equidistant
(intervals, bin_size, interval_range=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False)¶ Bin a given set of intervals into bins with a fixed size.
- Parameters
intervals – iterator of tuples (start, end, score)
bin_size – Size of each bin in base pairs
interval_range – Optional. Tuple (start, end) in base pairs of range of interval to be binned. Useful if intervals argument does not cover to exact genomic range to be binned.
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
- Returns
iterator of tuples: (start, end, score)
-
property
bin_size
¶ Return the length of the first region in the dataset.
Assumes all bins have equal size.
- Returns
int
-
binned_regions
(region=None, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, *args, **kwargs)¶ Same as region_intervals, but returns
GenomicRegion
objects instead of tuples.- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of
GenomicRegion
objects
-
bins_to_distance
(bins)¶ Convert fraction of bins to base pairs
- Parameters
bins – float, fraction of bins
- Returns
int, base pairs
-
property
chromosome_bins
¶ Returns a dictionary of chromosomes and the start and end index of the bins they cover.
Returned list is range-compatible, i.e. chromosome bins [0,5] cover chromosomes 1, 2, 3, and 4, not 5.
-
property
chromosome_lengths
¶ Returns a dictionary of chromosomes and their length in bp.
-
chromosomes
()¶ List all chromosomes in this regions table. :return: list of chromosome names.
-
close
(copy_tmp=True, remove_tmp=True)¶ Close this HDF5 file and run exit operations.
If file was opened with tmpdir in read-only mode: close file and delete temporary copy.
If file was opened with tmpdir in write or append mode: Replace original file with copy and delete copy.
- Parameters
copy_tmp – If False, does not overwrite original with modified file.
remove_tmp – If False, does not delete temporary copy of file.
-
distance_to_bins
(distance)¶ Convert base pairs to fraction of bins.
- Parameters
distance – distance in base pairs
- Returns
float, distance as fraction of bin size
-
downsample
(n, file_name=None)¶ Sample edges from this object.
Sampling is always done on uncorrected Hi-C matrices.
- Parameters
n – Sample size or reference object. If n < 1 will be interpreted as a fraction of total reads in this object.
file_name – Output file name for down-sampled object.
- Returns
-
edge_data
(attribute, *args, **kwargs)¶ Iterate over specific edge attribute.
-
edge_subset
(key=None, *args, **kwargs)¶ Get a subset of edges.
This is an alias for
edges()
.- Returns
generator (
Edge
)
-
property
edges
¶ Iterate over contacts / edges.
edges()
is the central function ofRegionPairsContainer
. Here, we will use theHic
implementation for demonstration purposes, but the usage is exactly the same for all compatible objects implementingRegionPairsContainer
, includingJuicerHic
andCoolerHic
.import fanc # file from FAN-C examples hic = fanc.load("output/hic/binned/fanc_example_1mb.hic")
We can easily find the number of edges in the sample
Hic
object:len(hic.edges) # 8695
When used in an iterator context,
edges()
iterates over all edges in theRegionPairsContainer
:for edge in hic.edges: # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Calling
edges()
as a method has the same effect:# note the '()' for edge in hic.edges(): # do something with edge print(edge) # 42--42; bias: 5.797788472650082e-05; sink_node: chr18:42000001-43000000; source_node: chr18:42000001-43000000; weight: 0.12291311562018173 # 24--28; bias: 6.496381719803623e-05; sink_node: chr18:28000001-29000000; source_node: chr18:24000001-25000000; weight: 0.025205961072838057 # 5--76; bias: 0.00010230955745211447; sink_node: chr18:76000001-77000000; source_node: chr18:5000001-6000000; weight: 0.00961709840049876 # 66--68; bias: 8.248432587969082e-05; sink_node: chr18:68000001-69000000; source_node: chr18:66000001-67000000; weight: 0.03876763316345468 # ...
Rather than iterate over all edges in the object, we can select only a subset. If the key is a string or a
GenomicRegion
, all non-zero edges connecting the region described by the key to any other region are returned. If the key is a tuple of strings orGenomicRegion
, only edges between the two regions are returned.# select all edges between chromosome 19 # and any other region: for edge in hic.edges("chr19"): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ... # select all edges that are only on # chromosome 19 for edge in hic.edges(('chr19', 'chr19')): print(edge) # 90--116; bias: 0.00021173151730025176; sink_node: chr19:37000001-38000000; source_node: chr19:11000001-12000000; weight: 0.009104455243910825 # 135--135; bias: 0.00018003890596887822; sink_node: chr19:56000001-57000000; source_node: chr19:56000001-57000000; weight: 0.10028167062466517 # 123--123; bias: 0.00011063368998965993; sink_node: chr19:44000001-45000000; source_node: chr19:44000001-45000000; weight: 0.1386240135570439 # 92--93; bias: 0.00040851066434864896; sink_node: chr19:14000001-15000000; source_node: chr19:13000001-14000000; weight: 0.10090213409411629 # ... # select inter-chromosomal edges # between chromosomes 18 and 19 for edge in hic.edges(('chr18', 'chr19')): print(edge) # 49--106; bias: 0.00026372303696871666; sink_node: chr19:27000001-28000000; source_node: chr18:49000001-50000000; weight: 0.003692122517562033 # 6--82; bias: 0.00021923129703834945; sink_node: chr19:3000001-4000000; source_node: chr18:6000001-7000000; weight: 0.0008769251881533978 # 47--107; bias: 0.00012820949175399097; sink_node: chr19:28000001-29000000; source_node: chr18:47000001-48000000; weight: 0.0015385139010478917 # 38--112; bias: 0.0001493344481069762; sink_node: chr19:33000001-34000000; source_node: chr18:38000001-39000000; weight: 0.0005973377924279048 # ...
By default,
edges()
will retrieve all edge attributes, which can be slow when iterating over a lot of edges. This is why all file-based FAN-CRegionPairsContainer
objects support lazy loading, where attributes are only read on demand.for edge in hic.edges('chr18', lazy=True): print(edge.source, edge.sink, edge.weight, edge) # 42 42 0.12291311562018173 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #0> # 24 28 0.025205961072838057 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #1> # 5 76 0.00961709840049876 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #2> # 66 68 0.03876763316345468 <fanc.matrix.LazyEdge for row /edges/chrpair_0_0.row (Row), pointing to row #3> # ...
Warning
The lazy iterator reuses the
LazyEdge
object in every iteration, and overwrites theLazyEdge
attributes. Therefore do not use lazy iterators if you need to store edge objects for later access. For example, the following code works as expectedlist(hic.edges())
, with allEdge
objects stored in the list, while this codelist(hic.edges(lazy=True))
will result in a list of identicalLazyEdge
objects. Always ensure you do all edge processing in the loop when working with lazy iterators!When working with normalised contact frequencies, such as obtained through matrix balancing in the example above,
edges()
automatically returns normalised edge weights. In addition, thebias
attribute will (typically) have a value different from 1.When you are interested in the raw contact frequency, use the
norm=False
parameter:for edge in hic.edges('chr18', lazy=True, norm=False): print(edge.source, edge.sink, edge.weight) # 42 42 2120.0 # 24 28 388.0 # 5 76 94.0 # 66 68 470.0 # ...
You can also choose to omit all intra- or inter-chromosomal edges using
intra_chromosomal=False
orinter_chromosomal=False
, respectively.- Returns
Iterator over
Edge
or equivalent.
-
edges_dict
(*args, **kwargs)¶ Edges iterator with access by bracket notation.
This iterator always returns unnormalised edges.
- Returns
dict or dict-like iterator
-
filter
(edge_filter, queue=False, log_progress=True)¶ Filter edges in this object by using a
MaskFilter
.- Parameters
edge_filter – Class implementing
MaskFilter
.queue – If True, filter will be queued and can be executed along with other queued filters using
run_queued_filters()
log_progress – If true, process iterating through all edges will be continuously reported.
-
find_region
(query_regions, _regions_dict=None, _region_ends=None, _chromosomes=None)¶ Find the region that is at the center of a region.
- Parameters
query_regions – Region selector string, :class:~GenomicRegion, or list of the former
- Returns
index (or list of indexes) of the region at the center of the query region
-
flush
(silent=False, update_mappability=True)¶ Write data to file and flush buffers.
- Parameters
silent – do not print flush progress
update_mappability – After writing data, update mappability and expected values
-
get_mask
(key)¶ Search _mask table for key and return Mask.
- Parameters
key (int) – search by mask name
key – search by mask ID
- Returns
Mask
-
get_masks
(ix)¶ Extract mask IDs encoded in parameter and return masks.
IDs are powers of 2, so a single int field in the table can hold multiple masks by simply adding up the IDs. Similar principle to UNIX chmod (although that uses base 8)
- Parameters
ix (int) – integer that is the sum of powers of 2. Note that this value is not necessarily itself a power of 2.
- Returns
list of Masks extracted from ix
- Return type
list (Mask)
-
intervals
(*args, **kwargs)¶ Alias for region_intervals.
-
mappable
(region=None)¶ Get the mappability of regions in this object.
A “mappable” region has at least one contact to another region in the genome.
- Returns
array
where True means mappable and False unmappable
-
classmethod
merge
(pairs, *args, **kwargs)¶ Merge two or more
RegionPairsTable
objects.- Parameters
pairs – list of
RegionPairsTable
- Returns
merged
RegionPairsTable
-
region_bins
(*args, **kwargs)¶ Return slice of start and end indices spanned by a region.
- Parameters
args – provide a
GenomicRegion
here to get the slice of start and end bins of onlythis region. To get the slice over all regions leave this blank.- Returns
-
region_data
(key, value=None)¶ Retrieve or add vector-data to this object. If there is existing data in this object with the same name, it will be replaced
- Parameters
key – Name of the data column
value – vector with region-based data (one entry per region)
-
region_intervals
(region, bins=None, bin_size=None, smoothing_window=None, nan_replacement=None, zero_to_nan=False, score_field='score', *args, **kwargs)¶ Return equally-sized genomic intervals and associated scores.
Use either bins or bin_size argument to control binning.
- Parameters
region – String or class:~GenomicRegion object denoting the region to be binned
bins – Number of bins to divide the region into
bin_size – Size of each bin (alternative to bins argument)
smoothing_window – Size of window (in bins) to smooth scores over
nan_replacement – NaN values in the scores will be replaced with this value
zero_to_nan – If True, will convert bins with score 0 to NaN
args – Arguments passed to _region_intervals
kwargs – Keyword arguments passed to _region_intervals
- Returns
iterator of tuples: (start, end, score)
-
region_subset
(region, *args, **kwargs)¶ Takes a class:~GenomicRegion and returns all regions that overlap with the supplied region.
- Parameters
region – String or class:~GenomicRegion object for which covered bins will be returned.
-
property
regions
¶ Iterate over genomic regions in this object.
Will return a
GenomicRegion
object in every iteration. Can also be used to get the number of regions by calling len() on the object returned by this method.- Returns
RegionIter
-
regions_and_edges
(key, *args, **kwargs)¶ Convenient access to regions and edges selected by key.
-
property
regions_dict
¶ Return a dictionary with region index as keys and regions as values.
- Returns
dict {region.ix: region, …}
-
static
regions_identical
(pairs)¶ Check if the regions in all objects in the list are identical.
- Parameters
pairs –
list
ofRegionBased
objects- Returns
True if chromosome, start, and end are identical between all regions in the same list positions.
-
run_queued_filters
(log_progress=True)¶ Run queued filters.
- Parameters
log_progress – If true, process iterating through all edges will be continuously reported.
-
subset
(*regions, **kwargs)¶ Subset a Hic object by specifying one or more subset regions.
- Parameters
regions – string or GenomicRegion object(s)
kwargs – Supports file_name: destination file name of subset Hic object; tmpdir: if True works in tmp until object is closed additional parameters are passed to
edges()
- Returns
Hic
-
to_bed
(file_name, subset=None, **kwargs)¶ Export regions as BED file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bed()
-
to_bigwig
(file_name, subset=None, **kwargs)¶ Export regions as BigWig file.
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_bigwig()
-
to_gff
(file_name, subset=None, **kwargs)¶ Export regions as GFF file
- Parameters
file_name – Path of file to write regions to
subset – optional
GenomicRegion
or str to write only regions overlapping this regionkwargs – Passed to
write_gff()
-
class