Merge overlapping windows using harmonic mean p-values from significance testing

mergeByHMP(x, ...)

# S4 method for class 'GenomicRanges'
mergeByHMP(
  x,
  df = NULL,
  w = NULL,
  logfc = "logFC",
  pval = "P",
  cpm = "logCPM",
  inc_cols = NULL,
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = TRUE,
  min_win = 1,
  keyval = c("min", "merged"),
  hm_pre = "hm",
  ...
)

# S4 method for class 'RangedSummarizedExperiment'
mergeByHMP(
  x,
  df = NULL,
  w = NULL,
  logfc = "logFC",
  pval = "P",
  cpm = "logCPM",
  inc_cols = NULL,
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = FALSE,
  hm_pre = "hm",
  ...
)

Arguments

x

GenomicRanges object

...

Not used

df

data.frame with results of differential binding analysis performed using a sliding window strategy. If not provided, the columns in the mcols() element of x will be used

w

vector of weights to applied when calculating harmonic mean p-values

logfc, pval, cpm

Column names for the values holding window specific estimates of change in binding (logfc), overall signal intensity (cpm) and the significance from statistical testing (pval).

inc_cols

(Optional) Character vector of any additional columns in df to return. Values will correspond to the range in the keyval_range column

p_adj_method

One of p.adjust.methods or "fwer". If "fwer" is specified the adjusted harmonic-mean p-value will be returned in a form which strictly controls the experiment-wide FWER. Please see vignette("harmonicmeanp") for more details

merge_within

Merge any non-overlapping windows within this distance

ignore_strand

Passed internally to reduce and findOverlaps

min_win

Only keep merged windows derived from at least this number

keyval

Return the key-value range as the window associated with the minimum p-value, or by merging the ranges from all windows with raw p-values below the merged harmonic-mean p-value

hm_pre

Prefix to add to the beginning of all HMP-derived columns

Value

A GenomicRanges object with merged ranges from the original object along with summarised or representative values from the relevant columns. The range corresponding to a representative values is also returned as described above

Details

When using sliding windows to test for differential signal, overlapping windows can be merged based on the significance of results. mergeByHMP() merges overlapping windows using the asymptotically exact harmonic mean p-value p.hmp from the individual, window-level tests. This tests the Null Hypothesis that there is no significance amongst the initial set of p-values, and returns a summarised value which controls the FDR within a set of tests (Wilson, PNAS, 2019). Multilevel testing across the set of results is currently implemented using p_adj_method = "fwer"

Given that the harmonic mean p-value is calculated from the inverse p-values, these are used to provide a weighted average of expression and logFC values in the returned object. Any weights provided in w are ignored for these values as they are simple representative estimates. The representative range returned in keyval_range corresponds to the window with the lowest p-value.

The total number of windows is also returned in the final object, with the summarised values n_up and n_down indicating the number of windows with raw p-values below the calculated harmonic mean p-value, and with the corresponding direction of change.

The column containing the harmonic mean p-values is returned as 'hmp'. An additional column with adjusted hmp-values is returned with the suffix '_*' added where the p-value adjustment method is added after the underscore.

Examples

x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
set.seed(1001)
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mergeByHMP(x, df, pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         1         0    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC       hmp   hmp_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   6.65177  0.782560 0.0161575 0.0323149
#>   [2]   7.85644 -0.185275 0.3592014 0.3592014
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mcols(x) <- df
x
#> GRanges object with 3 ranges and 3 metadata columns:
#>       seqnames    ranges strand |     logFC    logCPM         p
#>          <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric>
#>   [1]     chr1      1-10      * |  2.188648   5.49346 0.0184835
#>   [2]     chr1      6-15      * | -0.177547   7.44269 0.0126209
#>   [3]     chr1     51-60      * | -0.185275   7.85644 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeByHMP(x, pval = "p", p_adj_method = "fwer")
#> GRanges object with 2 ranges and 9 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         1         0    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC       hmp subjectHits  hmp_fwer
#>       <numeric> <numeric> <numeric>   <integer> <numeric>
#>   [1]   6.65177  0.782560 0.0161575           1 0.0251708
#>   [2]   7.85644 -0.185275 0.3592014           2 0.7995503
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths