Test enrichment across a cluster of motifs using a background set of sequences
Source:R/testClusterEnrich.R
testClusterEnrich.Rd
Test for enrichment of any motif within a cluster across a set of sequences using a background set to derive a NULL hypothesis
Arguments
- cl
A list of Position Weight Matrices, universalmotifs, with each element representing clusters of related matrices
- stringset
An XStringSet with equal sequence widths
- bg
An XStringSet with the same sequence widths as the test XStringset
- var
A column in the mcols element of bg, usually denoting an iteration number
- model
The model used for analysis
- sort_by
Column to sort results by
- mc.cores
Passed to mclapply
- ...
Passed to getPwmMatches or countPwmMatches
Value
See testMotifEnrich
Details
This extends the analytic methods offered by testMotifEnrich using PWMs grouped into a set of clusters. As with all cluster-level approaches, hits from multiple PWMs which overlap are counted as a single hit ensuring that duplicated matches are not double-counted, and that only individual positions within the sequences are.
Examples
## Load the example peaks & the sequences
data("ar_er_peaks")
data("ar_er_seq")
sq <- seqinfo(ar_er_peaks)
## Now sample size-matched ranges 10 times larger. In real-world analyses,
## this set should be sampled as at least 1000x larger, ensuring features
## are matched to your requirements. This example masks regions with known N
## content, including centromeres & telomeres
data("hg19_mask")
set.seed(305)
bg_ranges <- makeRMRanges(
ar_er_peaks, GRanges(sq)[1], exclude = hg19_mask, n_iter = 10
)
## Convert ranges to DNAStringSets
library(BSgenome.Hsapiens.UCSC.hg19)
#> Loading required package: BSgenome
#> Loading required package: BiocIO
#> Loading required package: rtracklayer
#>
#> Attaching package: ‘rtracklayer’
#> The following object is masked from ‘package:BiocIO’:
#>
#> FileForFormat
genome <- BSgenome.Hsapiens.UCSC.hg19
bg_seq <- getSeq(genome, bg_ranges)
## Test for enrichment of clustered motifs
data("ex_pfm")
cl <- list(A = ex_pfm[1], B = ex_pfm[2:3])
testClusterEnrich(cl, ar_er_seq, bg_seq, model = "poisson")
#> sequences matches expected enrichment Z p fdr
#> B 849 121 26.7 4.531835 18.24971 1.625599e-40 3.251198e-40
#> A 849 22 0.5 44.000000 30.40559 1.315112e-28 1.315112e-28
#> est_bg_rate
#> B 0.0314487633
#> A 0.0005889282