Get Profile Data surrounding specified ranges — getProfileData • extraChIPs

Get coverage Profile Data surrounding specified ranges

getProfileData(x, gr, ...)

# S4 method for class 'BigWigFile,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  n_max = Inf,
  ...
)

# S4 method for class 'BigWigFileList,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  BPPARAM = SerialParam(),
  ...
)

# S4 method for class 'character,GenomicRanges'
getProfileData(
  x,
  gr,
  upstream = 2500,
  downstream = upstream,
  bins = 100,
  mean_mode = "w0",
  log = TRUE,
  offset = 1,
  ...
)

Arguments

x: A BigWigFile or BigWiFileList
gr: A GRanges object
...: Passed to normalizeToMatrix
upstream: The distance to extend upstream from the centre of each range within gr
downstream: The distance to extend downstream from the centre of each range within gr
bins: The total number of bins to break the extended ranges into
mean_mode: The method used for calculating the score for each bin. See normalizeToMatrix for details
log: logical(1) Should the returned values be log2-transformed
offset: Value added to data if log-transforming. Ignored otherwise
n_max: Upper limit on the number of ranges to return profile data for. By default, no limit will be applied .
BPPARAM: Passed internally to bplapply

Value

GRanges or GrangesList with column profile_data, as described above

Details

This will take all provided ranges and set as identical width ranges, extending by the specified amount both up and downstream of the centre of the provided ranges. By default, the ranges extensions are symmetrical and only the upstream range needs to be specified, however this parameterisation allows for non-symmetrical ranges to be generated.

These uniform width ranges will then be used to extract the value contained in the score field from one or more BigWigFiles. Uniform width ranges are then broken into bins of equal width and the average score found within each bin.

The binned profiles are returned as a DataFrameList called profile_data as a column within the resized GRanges object. Column names in each DataFrame are score, position and bp.

If passing a BigWigFileList, profiles will be obtained in series by default. To run in parallel pass a MulticoreParam object to the BPPARAM argument.

Examples

bw <- system.file("tests", "test.bw", package = "rtracklayer")
gr <- GRanges("chr2:1000")
pd <- getProfileData(bw, gr, upstream = 500, bins = 10)
pd
#> GRanges object with 1 range and 1 metadata column:
#>       seqnames    ranges strand |                              profile_data
#>          <Rle> <IRanges>  <Rle> |                      <SplitDataFrameList>
#>   [1]     chr2  500-1499      * | -2.00:u1:-450,-1.02:u2:-350,-1.00:u3:-250
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
pd$profile_data
#> SplitDataFrameList of length 1
#> [[1]]
#> DataFrame with 10 rows and 3 columns
#>          score position        bp
#>      <numeric> <factor> <numeric>
#> 1  -2.00000000       u1      -450
#> 2  -1.02000000       u2      -350
#> 3  -1.00000000       u3      -250
#> 4  -1.00000000       u4      -150
#> 5  -0.42673675       u5       -50
#> 6  -0.41503750       d1        50
#> 7  -0.41503750       d2       150
#> 8  -0.00830075       d3       250
#> 9   0.00000000       d4       350
#> 10  0.00000000       d5       450
#>