Form a set of ranges from y which (near) exactly match those in x for use as a background set requiring matching

makeRMRanges(x, y, ...)

# S4 method for class 'GRanges,GRanges'
makeRMRanges(
  x,
  y,
  exclude = GRanges(),
  n_iter = 1,
  n_total = NULL,
  replace = TRUE,
  ...,
  force_ol = TRUE
)

# S4 method for class 'GRangesList,GRangesList'
makeRMRanges(
  x,
  y,
  exclude = GRanges(),
  n_iter = 1,
  n_total = NULL,
  replace = TRUE,
  mc.cores = 1,
  ...,
  force_ol = TRUE,
  unlist = TRUE
)

Arguments

x

GRanges/GRangesList with ranges to be matched

y

GRanges/GRangesList with ranges to select random matching ranges from

...

Not used

exclude

GRanges of ranges to omit from testing

n_iter

The number of times to repeat the random selection process

n_total

Setting this value will over-ride anything set using n_iter. Can be vector of any length, corresponding to the length of x, when x is a GRangesList

replace

logical(1) Sample with our without replacement when creating the set of random ranges.

force_ol

logical(1) Enforce an overlap between every site in x and y

mc.cores

Passsed to mclapply

unlist

logical(1) Return as a sorted GRanges object, or leave as a GRangesList

Value

A GRanges or GRangesList object

Details

This function uses the width distribution of the 'test' ranges (i.e. x) to randomly sample a set of ranges with matching width from the ranges provided in y. The width distribution will clearly be exact when a set of fixed-width ranges is passed to x, whilst random sampling may yield some variability when matching ranges of variable width.

When both x and y are GRanges objects, they are implcitly assumed to both represent similar ranges, such as those overlapping a promoter or enhancer. When passing two GRangesList objects, both objects are expected to contain ranges annotated as belonging to key features, such that the list elements in y must encompass all elements in x. For example if x contains two elements named 'promoter' and 'intron', y should also contain elements named 'promoter' and 'intron' and these will be sampled as matching ranges for the same element in x. If elements of x and y are not named, they are assumed to be in matching order.

The default behaviour is to assume that randomly-generated ranges are for iteration, and as such, ranges are randomly formed in multiples of the number of 'test' ranges provided in x. The column iteration will be added to the returned ranges. Placing any number into the n_total argument will instead select a total number of ranges as specified here. In this case, no iteration column will be included in the returned ranges.

Sampling is assumed to be with replacement as this is most suitable for bootstrapping and related procedures, although this can be disabled by setting replace = FALSE

Examples

## Load the example peaks
data("ar_er_peaks")
sq <- seqinfo(ar_er_peaks)
## Now sample size-matched ranges for two iterations from chr1
makeRMRanges(ar_er_peaks, GRanges(sq)[1], n_iter = 2)
#> GRanges object with 458 ranges and 1 metadata column:
#>         seqnames              ranges strand | iteration
#>            <Rle>           <IRanges>  <Rle> | <integer>
#>     [1]     chr1       238477-238876      * |         1
#>     [2]     chr1       462793-463192      * |         1
#>     [3]     chr1     1008010-1008409      * |         2
#>     [4]     chr1     1940152-1940551      * |         1
#>     [5]     chr1     2166468-2166867      * |         2
#>     ...      ...                 ...    ... .       ...
#>   [454]     chr1 245120112-245120511      * |         2
#>   [455]     chr1 246167695-246168094      * |         1
#>   [456]     chr1 246216201-246216600      * |         1
#>   [457]     chr1 246593982-246594381      * |         2
#>   [458]     chr1 248404146-248404545      * |         1
#>   -------
#>   seqinfo: 24 sequences from hg19 genome

## Or simply sample 100 ranges if not planning any iterative analyses
makeRMRanges(ar_er_peaks, GRanges(sq)[1], n_total = 100)
#> GRanges object with 100 ranges and 0 metadata columns:
#>         seqnames              ranges strand
#>            <Rle>           <IRanges>  <Rle>
#>     [1]     chr1       335925-336324      *
#>     [2]     chr1     1785328-1785727      *
#>     [3]     chr1     1866466-1866865      *
#>     [4]     chr1     3220233-3220632      *
#>     [5]     chr1     4276478-4276877      *
#>     ...      ...                 ...    ...
#>    [96]     chr1 239863778-239864177      *
#>    [97]     chr1 241539154-241539553      *
#>    [98]     chr1 245303949-245304348      *
#>    [99]     chr1 245378676-245379075      *
#>   [100]     chr1 248443254-248443653      *
#>   -------
#>   seqinfo: 24 sequences from hg19 genome