Merge overlapping windows using harmonic mean p-values from significance testing
mergeByHMP(x, ...)
# S4 method for class 'GenomicRanges'
mergeByHMP(
x,
df = NULL,
w = NULL,
logfc = "logFC",
pval = "P",
cpm = "logCPM",
inc_cols = NULL,
p_adj_method = "fdr",
merge_within = 1L,
ignore_strand = TRUE,
min_win = 1,
keyval = c("min", "merged"),
hm_pre = "hm",
...
)
# S4 method for class 'RangedSummarizedExperiment'
mergeByHMP(
x,
df = NULL,
w = NULL,
logfc = "logFC",
pval = "P",
cpm = "logCPM",
inc_cols = NULL,
p_adj_method = "fdr",
merge_within = 1L,
ignore_strand = FALSE,
hm_pre = "hm",
...
)
GenomicRanges object
Not used
data.frame with results of differential binding analysis performed
using a sliding window strategy. If not provided, the columns in the
mcols()
element of x
will be used
vector of weights to applied when calculating harmonic mean p-values
Column names for the values holding window specific estimates of change in binding (logfc), overall signal intensity (cpm) and the significance from statistical testing (pval).
(Optional) Character vector of any additional columns in
df
to return. Values will correspond to the range in the keyval_range
column
One of p.adjust.methods
or "fwer". If "fwer" is
specified the adjusted harmonic-mean p-value will be returned in a form
which strictly controls the experiment-wide FWER. Please see
vignette("harmonicmeanp") for more details
Merge any non-overlapping windows within this distance
Passed internally to reduce and findOverlaps
Only keep merged windows derived from at least this number
Return the key-value range as the window associated with the minimum p-value, or by merging the ranges from all windows with raw p-values below the merged harmonic-mean p-value
Prefix to add to the beginning of all HMP-derived columns
A GenomicRanges object with merged ranges from the original object along with summarised or representative values from the relevant columns. The range corresponding to a representative values is also returned as described above
When using sliding windows to test for differential signal, overlapping
windows can be merged based on the significance of results.
mergeByHMP()
merges overlapping windows using the asymptotically exact
harmonic mean p-value p.hmp from the individual,
window-level tests. This tests the Null Hypothesis that there is no
significance amongst the initial set of p-values, and returns a summarised
value which controls the FDR within a set of tests (Wilson, PNAS, 2019).
Multilevel testing across the set of results is currently implemented using
p_adj_method = "fwer"
Given that the harmonic mean p-value is calculated from the inverse p-values,
these are used to provide a weighted average of expression and logFC values
in the returned object. Any weights provided in w
are ignored for these
values as they are simple representative estimates.
The representative range returned in keyval_range
corresponds to the window
with the lowest p-value.
The total number of windows is also returned in the final object, with the summarised values n_up and n_down indicating the number of windows with raw p-values below the calculated harmonic mean p-value, and with the corresponding direction of change.
The column containing the harmonic mean p-values is returned as 'hmp'. An additional column with adjusted hmp-values is returned with the suffix '_*' added where the p-value adjustment method is added after the underscore.
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
set.seed(1001)
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mergeByHMP(x, df, pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#> seqnames ranges strand | n_windows n_up n_down keyval_range
#> <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <GRanges>
#> [1] chr1 1-15 * | 2 1 0 chr1:6-15
#> [2] chr1 51-60 * | 1 0 1 chr1:51-60
#> logCPM logFC hmp hmp_fdr
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 6.65177 0.782560 0.0161575 0.0323149
#> [2] 7.85644 -0.185275 0.3592014 0.3592014
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
mcols(x) <- df
x
#> GRanges object with 3 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC logCPM p
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 1-10 * | 2.188648 5.49346 0.0184835
#> [2] chr1 6-15 * | -0.177547 7.44269 0.0126209
#> [3] chr1 51-60 * | -0.185275 7.85644 0.3201591
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeByHMP(x, pval = "p", p_adj_method = "fwer")
#> GRanges object with 2 ranges and 9 metadata columns:
#> seqnames ranges strand | n_windows n_up n_down keyval_range
#> <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <GRanges>
#> [1] chr1 1-15 * | 2 1 0 chr1:6-15
#> [2] chr1 51-60 * | 1 0 1 chr1:51-60
#> logCPM logFC hmp subjectHits hmp_fwer
#> <numeric> <numeric> <numeric> <integer> <numeric>
#> [1] 6.65177 0.782560 0.0161575 1 0.0251708
#> [2] 7.85644 -0.185275 0.3592014 2 0.7995503
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths