Get coverage Profile Data surrounding specified ranges

getProfileData(x, gr, ...)

# S4 method for class 'BigWigFile,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  n_max = Inf,
  ...
)

# S4 method for class 'BigWigFileList,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  BPPARAM = SerialParam(),
  ...
)

# S4 method for class 'character,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  ...
)

Arguments

x

A BigWigFile or BigWiFileList

gr

A GRanges object

...

Passed to normalizeToMatrix

upstream

The distance to extend upstream from the centre of each range within gr

downstream

The distance to extend downstream from the centre of each range within gr

bins

The total number of bins to break the extended ranges into

mean_mode

The method used for calculating the score for each bin. See normalizeToMatrix for details

log

logical(1) Should the returned values be log2-transformed

offset

Value added to data if log-transforming. Ignored otherwise

n_max

Upper limit on the number of ranges to return profile data for. By default, no limit will be applied .

BPPARAM

Passed internally to bplapply

Value

GRanges or GrangesList with column profile_data, as described above

Details

This will take all provided ranges and set as identical width ranges, extending by the specified amount both up and downstream of the centre of the provided ranges. By default, the ranges extensions are symmetrical and only the upstream range needs to be specified, however this parameterisation allows for non-symmetrical ranges to be generated.

These uniform width ranges will then be used to extract the value contained in the score field from one or more BigWigFiles. Uniform width ranges are then broken into bins of equal width and the average score found within each bin.

The binned profiles are returned as a DataFrameList called profile_data as a column within the resized GRanges object. Column names in each DataFrame are score, position and bp.

If passing a BigWigFileList, profiles will be obtained in series by default. To run in parallel pass a MulticoreParam object to the BPPARAM argument.

Examples

bw <- system.file("tests", "test.bw", package = "rtracklayer")
gr <- GRanges("chr2:1000")
pd <- getProfileData(bw, gr, upstream = 500, bins = 10)
pd
#> GRanges object with 1 range and 1 metadata column:
#>       seqnames    ranges strand |                              profile_data
#>          <Rle> <IRanges>  <Rle> |                      <SplitDataFrameList>
#>   [1]     chr2  500-1499      * | -2.00:u1:-450,-1.02:u2:-350,-1.00:u3:-250
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
pd$profile_data
#> SplitDataFrameList of length 1
#> [[1]]
#> DataFrame with 10 rows and 3 columns
#>          score position        bp
#>      <numeric> <factor> <numeric>
#> 1  -2.00000000       u1      -450
#> 2  -1.02000000       u2      -350
#> 3  -1.00000000       u3      -250
#> 4  -1.00000000       u4      -150
#> 5  -0.42673675       u5       -50
#> 6  -0.41503750       d1        50
#> 7  -0.41503750       d2       150
#> 8  -0.00830075       d3       250
#> 9   0.00000000       d4       350
#> 10  0.00000000       d5       450
#>