Merge sliding windows using a specified column

mergeByCol(x, ...)

# S4 method for class 'GenomicRanges'
  df = NULL,
  by = c("max", "median", "mean", "min"),
  logfc = "logFC",
  pval = "P",
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = TRUE,
  min_win = 1,

# S4 method for class 'RangedSummarizedExperiment'
  df = NULL,
  by = c("max", "median", "mean", "min"),
  logfc = "logFC",
  pval = "P",
  p_adj_method = "fdr",
  merge_within = 1L,
  ignore_strand = FALSE,



A GenomicRanges or SummarizedExperiment object


Not used


A data.frame-like object containing the columns of interest. If not provided, any columns in the mcols() slot will be used.


The column to select as representative of the merged ranges


The method for selecting representative values


Column containing logFC values


Column containing p-values


Any additional columns to return. Output will always include columns specified in the arguments col, logfc and pval. Note that values from any additional columns will correspond to the selected range returned in keyval_range


Any of p.adjust.methods


Merge any ranges within this distance


Passed internally to reduce and findOverlaps


Only keep merged windows derived from at least this number


A Genomic Ranges object


This merges sliding windows using the values in a given column to select representative values for the subsequent merged windows. Values can be chosen from the specified column using any of min(), max(), mean() or median(), although max() is strongly recommended when specifying values like logCPM. Once a representative range is selected using the specified column, values from columns specified using inc_cols are also returned. In addition to these columns, the range from the representative window is returned in the mcols element as a GRanges object in the column keyval_range.

Merging windows using either the logFC or p-value columns is not implemented.

If adjusted p-values are requested an additional column names the same as the initial p-value, but tagged with the adjustment method, will be added. In addition, using the p-value from the selected window, the number of windows with lower p-values are counted by direction and returned in the final object. The selected window will always be counted as up/down regardless of significance as the p-value for this column is taken as the threshold. This is a not dissimilar approach to cluster-direction.

If called on a SummarizedExperiment object, the function will be applied to the rowRanges element.


x <- GRanges(c("chr1:1-10", "chr1:6-15", "chr1:51-60"))
df <- DataFrame(logFC = rnorm(3), logCPM = rnorm(3,8), p = rexp(3, 10))
mergeByCol(x, df, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         0         1    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   7.44269 -0.177547 0.0126209 0.0252419
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mcols(x) <- df
#> GRanges object with 3 ranges and 3 metadata columns:
#>       seqnames    ranges strand |     logFC    logCPM         p
#>          <Rle> <IRanges>  <Rle> | <numeric> <numeric> <numeric>
#>   [1]     chr1      1-10      * |  2.188648   5.49346 0.0184835
#>   [2]     chr1      6-15      * | -0.177547   7.44269 0.0126209
#>   [3]     chr1     51-60      * | -0.185275   7.85644 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
mergeByCol(x, col = "logCPM", pval = "p")
#> GRanges object with 2 ranges and 8 metadata columns:
#>       seqnames    ranges strand | n_windows      n_up    n_down keyval_range
#>          <Rle> <IRanges>  <Rle> | <integer> <integer> <integer>    <GRanges>
#>   [1]     chr1      1-15      * |         2         0         1    chr1:6-15
#>   [2]     chr1     51-60      * |         1         0         1   chr1:51-60
#>          logCPM     logFC         p     p_fdr
#>       <numeric> <numeric> <numeric> <numeric>
#>   [1]   7.44269 -0.177547 0.0126209 0.0252419
#>   [2]   7.85644 -0.185275 0.3201591 0.3201591
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths