Merge sliding windows using a specified column
mergeByCol(x, ...)
# S4 method for class 'GenomicRanges'
mergeByCol(
x,
df = NULL,
col,
by = c("max", "median", "mean", "min"),
logfc = "logFC",
pval = "P",
inc_cols,
p_adj_method = "fdr",
merge_within = 1L,
ignore_strand = TRUE,
min_win = 1,
...
)
# S4 method for class 'RangedSummarizedExperiment'
mergeByCol(
x,
df = NULL,
col,
by = c("max", "median", "mean", "min"),
logfc = "logFC",
pval = "P",
inc_cols,
p_adj_method = "fdr",
merge_within = 1L,
ignore_strand = FALSE,
...
)
A GenomicRanges or SummarizedExperiment object
Not used
A data.frame-like object containing the columns of interest. If not provided, any columns in the mcols() slot will be used.
The column to select as representative of the merged ranges
The method for selecting representative values
Column containing logFC values
Column containing p-values
Any additional columns to return. Output will always include
columns specified in the arguments col
, logfc
and pval
. Note that
values from any additional columns will correspond to the selected range
returned in keyval_range
Any of p.adjust.methods
Merge any ranges within this distance
Passed internally to reduce and findOverlaps
Only keep merged windows derived from at least this number
A Genomic Ranges object
This merges sliding windows using the values in a given column to select
representative values for the subsequent merged windows.
Values can be chosen from the specified column using any of min()
,
max()
, mean()
or median()
, although max()
is strongly recommended
when specifying values like logCPM.
Once a representative range is selected using the specified column, values
from columns specified using inc_cols
are also returned.
In addition to these columns, the range from the representative window is
returned in the mcols element as a GRanges object in the column
keyval_range
.
Merging windows using either the logFC or p-value columns is not implemented.
If adjusted p-values are requested an additional column names the same as the initial p-value, but tagged with the adjustment method, will be added. In addition, using the p-value from the selected window, the number of windows with lower p-values are counted by direction and returned in the final object. The selected window will always be counted as up/down regardless of significance as the p-value for this column is taken as the threshold. This is a not dissimilar approach to cluster-direction.
If called on a SummarizedExperiment object, the function will be applied to
the rowRanges
element.
x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
set.seed(1001)
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mergeByCol(x, df, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#> seqnames ranges strand | n_windows n_up n_down keyval_range
#> <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <GRanges>
#> [1] chr1 1-15 * | 2 0 1 chr1:6-15
#> [2] chr1 51-60 * | 1 0 1 chr1:51-60
#> logCPM logFC p p_fdr
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 7.44269 -0.177547 0.0126209 0.0252419
#> [2] 7.85644 -0.185275 0.3201591 0.3201591
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
mcols(x) <- df
x
#> GRanges object with 3 ranges and 3 metadata columns:
#> seqnames ranges strand | logFC logCPM p
#> <Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
#> [1] chr1 1-10 * | 2.188648 5.49346 0.0184835
#> [2] chr1 6-15 * | -0.177547 7.44269 0.0126209
#> [3] chr1 51-60 * | -0.185275 7.85644 0.3201591
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeByCol(x, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#> seqnames ranges strand | n_windows n_up n_down keyval_range
#> <Rle> <IRanges> <Rle> | <integer> <integer> <integer> <GRanges>
#> [1] chr1 1-15 * | 2 0 1 chr1:6-15
#> [2] chr1 51-60 * | 1 0 1 chr1:51-60
#> logCPM logFC p p_fdr
#> <numeric> <numeric> <numeric> <numeric>
#> [1] 7.44269 -0.177547 0.0126209 0.0252419
#> [2] 7.85644 -0.185275 0.3201591 0.3201591
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths