Map Genomic Ranges to genes using defined regulatory features
mapByFeature(
gr,
genes,
prom,
enh,
gi,
cols = c("gene_id", "gene_name", "symbol"),
gr2prom = 0,
gr2enh = 0,
gr2gi = 0,
gr2gene = 1e+05,
prom2gene = 0,
enh2gene = 1e+05,
gi2gene = 0,
...
)
GRanges object with query ranges to be mapped to genes
GRanges object containing genes (or any other nominal feature) to be assigned
GRanges object defining promoters
GRanges object defining Enhancers
GInteractions object defining interactions. Mappings from interactions to genes should be performed as a separate prior step.
Column names to be assigned as mcols in the output. Columns
must be minimally present in genes
. If all requested columns are found in
any of prom, enh or gi, these pre-existing mappings will be preferentially
used. Any columns not found in utilised reference objects will be ignored.
The maximum permissible distance between a query range and any ranges defined as promoters
The maximum permissible distance between a query range and any ranges defined as enhancers
The maximum permissible distance between a query range and any ranges defined as GInteraction anchors
The maximum permissible distance between a query range and genes (for ranges not otherwise mapped)
The maximum permissible distance between a range provided
in prom
and a gene
The maximum permissible distance between a range provided
in enh
and a gene
The maximum permissible distance between a GInteractions
anchor (provided in gi
) and a gene
Passed to findOverlaps and overlapsAny internally
A GRanges object with added mcols as specified
This function is able to utilise feature-level information and long-range interactions to enable better mapping of regions to genes. If provided, this essentially maps from ranges to genes using the regulatory features as a framework. The following sequential strategy is used:
Ranges overlapping a promoter are assigned to that gene
Ranges overlapping an enhancer are assigned to all genes within a specified distance
Ranges overlapping a long-range interaction are assigned to all genes connected by the interaction
Ranges with no gene assignment from the previous steps are assigned to all overlapping genes or the nearest gene within a specified distance
If information is missing for one of these steps, the algorithm will simply proceed to the next step. If no promoter, enhancer or interaction data is provided, all ranges will be simply mapped by step 4. Ranges can be mapped by any or all of the first three steps, but step 4 is mutually exclusive with the first 3 steps.
Distances between each set of features and the query range can be
individually specified by modifying the gr2prom
, gr2enh
, gr2gi
or
gr2gene
parameters. Distances between features and genes can also be set
using the parameters prom2gene
, enh2gene
and gi2gene
.
Additionally, if previously defined mappings are included with any of the
prom
, enh
or gi
objects, this will be used in preference to any
obtained from the genes
object.
## Define some genes
genes <- GRanges(c("chr1:2-10:*", "chr1:25-30:-", "chr1:31-40:+"))
genes$gene_id <- paste0("gene", seq_along(genes))
genes
#> GRanges object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | gene_id
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr1 2-10 * | gene1
#> [2] chr1 25-30 - | gene2
#> [3] chr1 31-40 + | gene3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Add a promoter for each gene
prom <- promoters(genes, upstream = 1, downstream = 1)
prom
#> GRanges object with 3 ranges and 1 metadata column:
#> seqnames ranges strand | gene_id
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr1 1-2 * | gene1
#> [2] chr1 30-31 - | gene2
#> [3] chr1 30-31 + | gene3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Some ranges to map
gr <- GRanges(paste0("chr1:", seq(0, 60, by = 15)))
gr
#> GRanges object with 5 ranges and 0 metadata columns:
#> seqnames ranges strand
#> <Rle> <IRanges> <Rle>
#> [1] chr1 0 *
#> [2] chr1 15 *
#> [3] chr1 30 *
#> [4] chr1 45 *
#> [5] chr1 60 *
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Map so that any gene within 25bp of the range is assigned
mapByFeature(gr, genes, gr2gene = 25)
#> GRanges object with 5 ranges and 1 metadata column:
#> seqnames ranges strand | gene_id
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> [1] chr1 0 * | gene1
#> [2] chr1 15 * | gene1
#> [3] chr1 30 * | gene2
#> [4] chr1 45 * | gene3
#> [5] chr1 60 * | gene3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
## Now use promoters to be more accurate in the gene assignment
## Given that the first range overlaps the promoter of gene1, this is a
## more targetted approach. Similarly for the third range
mapByFeature(gr, genes, prom, gr2gene = 25)
#> GRanges object with 5 ranges and 1 metadata column:
#> seqnames ranges strand | gene_id
#> <Rle> <IRanges> <Rle> | <CharacterList>
#> [1] chr1 0 * | gene1
#> [2] chr1 15 * | gene1
#> [3] chr1 30 * | gene2,gene3
#> [4] chr1 45 * | gene3
#> [5] chr1 60 * | gene3
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths