Merge sliding windows using a specified column

mergeByCol(x, ...)

# S4 method for class 'GenomicRanges'
mergeByCol(
  x,
  df = NULL,
  col,
  by = c("max", "median", "mean", "min"),
  logfc = "logFC",
  pval = "P",
  inc_cols,
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = TRUE,
  min_win = 1,
  ...
)

# S4 method for class 'RangedSummarizedExperiment'
mergeByCol(
  x,
  df = NULL,
  col,
  by = c("max", "median", "mean", "min"),
  logfc = "logFC",
  pval = "P",
  inc_cols,
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = FALSE,
  ...
)

Arguments

x

A GenomicRanges or SummarizedExperiment object

...

Not used

df

A data.frame-like object containing the columns of interest. If not provided, any columns in the mcols() slot will be used.

col

The column to select as representative of the merged ranges

by

The method for selecting representative values

logfc

Column containing logFC values

pval

Column containing p-values

inc_cols

Any additional columns to return. Output will always include columns specified in the arguments col, logfc and pval. Note that values from any additional columns will correspond to the selected range returned in keyval_range

p_adj_method

Any of p.adjust.methods

merge_within

Merge any ranges within this distance

ignore_strand

Passed internally to reduce and findOverlaps

min_win

Only keep merged windows derived from at least this number

Value

A Genomic Ranges object

Details

This merges sliding windows using the values in a given column to select representative values for the subsequent merged windows. Values can be chosen from the specified column using any of min(), max(), mean() or median(), although max() is strongly recommended when specifying values like logCPM. Once a representative range is selected using the specified column, values from columns specified using inc_cols are also returned. In addition to these columns, the range from the representative window is returned in the mcols element as a GRanges object in the column keyval_range.

Merging windows using either the logFC or p-value columns is not implemented.

If adjusted p-values are requested an additional column names the same as the initial p-value, but tagged with the adjustment method, will be added. In addition, using the p-value from the selected window, the number of windows with lower p-values are counted by direction and returned in the final object. The selected window will always be counted as up/down regardless of significance as the p-value for this column is taken as the threshold. This is a not dissimilar approach to cluster-direction.

If called on a SummarizedExperiment object, the function will be applied to the rowRanges element.

Examples

x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
set.seed(1001)
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mergeByCol(x, df, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         0         1    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   7.44269 -0.177547 0.0126209 0.0252419
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mcols(x) <- df
x
#> GRanges object with 3 ranges and 3 metadata columns:
#>       seqnames    ranges strand |     logFC    logCPM         p
#>          <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric>
#>   [1]     chr1      1-10      * |  2.188648   5.49346 0.0184835
#>   [2]     chr1      6-15      * | -0.177547   7.44269 0.0126209
#>   [3]     chr1     51-60      * | -0.185275   7.85644 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeByCol(x, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         0         1    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   7.44269 -0.177547 0.0126209 0.0252419
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths