Skip to contents

Test for enrichment of any motif within a cluster across a set of sequences using a background set to derive a NULL hypothesis

Usage

testClusterEnrich(
  cl,
  stringset,
  bg,
  var = "iteration",
  model = c("quasipoisson", "hypergeometric", "poisson", "iteration"),
  sort_by = c("p", "none"),
  mc.cores = 1,
  ...
)

Arguments

cl

A list of Position Weight Matrices, universalmotifs, with each element representing clusters of related matrices

stringset

An XStringSet with equal sequence widths

bg

An XStringSet with the same sequence widths as the test XStringset

var

A column in the mcols element of bg, usually denoting an iteration number

model

The model used for analysis

sort_by

Column to sort results by

mc.cores

Passed to mclapply

...

Passed to getPwmMatches or countPwmMatches

Value

See testMotifEnrich

Details

This extends the analytic methods offered by testMotifEnrich using PWMs grouped into a set of clusters. As with all cluster-level approaches, hits from multiple PWMs which overlap are counted as a single hit ensuring that duplicated matches are not double-counted, and that only individual positions within the sequences are.

Examples

## Load the example peaks & the sequences
data("ar_er_peaks")
data("ar_er_seq")
sq <- seqinfo(ar_er_peaks)
## Now sample size-matched ranges 10 times larger. In real-world analyses,
## this set should be sampled as at least 1000x larger, ensuring features
## are matched to your requirements. This example masks regions with known N
## content, including centromeres & telomeres
data("hg19_mask")
set.seed(305)
bg_ranges <- makeRMRanges(
  ar_er_peaks, GRanges(sq)[1], exclude = hg19_mask, n_iter = 10
)

## Convert ranges to DNAStringSets
library(BSgenome.Hsapiens.UCSC.hg19)
#> Loading required package: BSgenome
#> Loading required package: BiocIO
#> Loading required package: rtracklayer
#> 
#> Attaching package: ‘rtracklayer’
#> The following object is masked from ‘package:BiocIO’:
#> 
#>     FileForFormat
genome <- BSgenome.Hsapiens.UCSC.hg19
bg_seq <- getSeq(genome, bg_ranges)

## Test for enrichment of clustered motifs
data("ex_pfm")
cl <- list(A = ex_pfm[1], B = ex_pfm[2:3])
testClusterEnrich(cl, ar_er_seq, bg_seq, model = "poisson")
#>   sequences matches expected enrichment        Z            p          fdr
#> B       849     121     26.7   4.531835 18.24971 1.625599e-40 3.251198e-40
#> A       849      22      0.5  44.000000 30.40559 1.315112e-28 1.315112e-28
#>    est_bg_rate
#> B 0.0314487633
#> A 0.0005889282