Merge overlapping windows using p-values from significance testing

mergeBySig(x, ...)

# S4 method for class 'GenomicRanges'
mergeBySig(
  x,
  df = NULL,
  logfc = "logFC",
  pval = "P",
  cpm = "logCPM",
  inc_cols,
  p_adj_method = "fdr",
  alpha = 0.05,
  method = c("combine", "best", "minimal"),
  merge_within = 1L,
  ignore_strand = TRUE,
  min_win = 1,
  ...
)

# S4 method for class 'RangedSummarizedExperiment'
mergeBySig(
  x,
  df = NULL,
  logfc = "logFC",
  pval = "P",
  cpm = "logCPM",
  inc_cols,
  p_adj_method = "fdr",
  alpha = 0.05,
  method = c("combine", "best", "minimal"),
  merge_within = 1L,
  ignore_strand = TRUE,
  ...
)

Arguments

x

GenomicRanges object

...

Passed to all csaw functions being wrapped

df

data.frame with results of differential binding analysis performed using a sliding window strategy. If not provided, the columns in the mcols() element of x will be used

logfc, pval, cpm

Column names for the values holding window specific estimates of change in binding (logfc), overall signal intensity (cpm) and the significance from statistical testing (pval)

inc_cols

(Optional) Character vector of any additional columns in df to return

p_adj_method

One of p.adjust.methods

alpha

Significance threshold to apply during internal calculations

method

Shorthand versions for which csaw strategy to use for merging windows. Choose from 'combine' (combineTests), 'best' (getBestTest) or 'minimal' (minimalTests).

merge_within

Merge any non-overlapping windows within this distance

ignore_strand

Passed internally to reduce and findOverlaps

min_win

Only keep merged windows derived from at least this number

Value

A GenomicRanges object with overlapping ranges from the original object merged and representative values returned. The range corresponding to the representative values is also returned

Details

When using sliding windows to test for differential signal, overlapping windows can be merged based on the significance of results. mergeBySig() is a wrapper to the functions combineTests, getBestTest and minimalTests, using each function's approach to finding a representative window. The returned object differs from those returned by the original functions in that the description of windows as 'up', 'down' or mixed is omitted and the genomic range corresponding to the representative window is also returned. Column names also correspond to those in the original object.

An additional column with adjusted p-values is returned. This column retains the same name as the original but with the suffix '_*' added where the p-value adjustment method is added after the underscore.

Examples

x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
set.seed(1001)
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mcols(x) <- df
mergeBySig(x, pval = "p", method = "combine")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         1         1    chr1:1-10
#>   [2]     chr1     51-60      * |         1         0         0   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   5.49346  2.188648 0.0184835  0.036967
#>   [2]   7.85644 -0.185275 0.3201591  0.320159
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeBySig(x, pval = "p", method = "best")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         1         1    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         0   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   7.44269 -0.177547 0.0252419 0.0504838
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeBySig(x, pval = "p", method = "min")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         1         1    chr1:1-10
#>   [2]     chr1     51-60      * |         1         0         0   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   5.49346  2.188648 0.0252419 0.0504838
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths