Detect differential ChIP signal using one of many approaches
fitAssayDiff(x, ...)
# S4 method for class 'SummarizedExperiment'
fitAssayDiff(
x,
assay = "counts",
design = NULL,
coef = NULL,
lib.size = "totals",
method = c("qlf", "lt", "wald"),
norm = c("none", "TMM", "RLE", "TMMwsp", "upperquartile"),
groups = NULL,
fc = 1,
lfc = log2(fc),
asRanges = FALSE,
offset = NULL,
weighted = FALSE,
...,
null = c("interval", "worst.case"),
robust = FALSE,
type = c("apeglm", "ashr", "normal")
)
a SummarizedExperiment object
Passed to normLibSizes and
The assay to use for analysis
The design matrix to use for analysis
The required column from the design matrix
The column within the colData element which contains the library size information. If set to NULL, column summaries will be used.
the analytic method to be used. Can be 'qlf' which will fit counts using the glmQLFit strategy , or 'lt' which fits the limma-trend model on logCPM, or pre-processed logCPM values. Setting method = 'wald' will call nbinomWaldTest
The normalisation strategy to use when running the glmQLF model or the Wald test. The value 'none' relies solely on library-size normalisation, and is the default. All methods available in normLibSizes are implemented. Ignored when using method = "lt"
character(1) If a column name is supplied here, group-based normalisation will be applied to GLM models treating data in this column as a grouping factor. Ignored when using method = "lt"
logical(1). By default, the returned object will be a
SummarizedExperiment
object with the results added to the rowData
element. Setting asRanges = TRUE
will only return the GRanges object from
this element
If provided will be used as the offset when a DGEList object is created during model fitting for method = 'qlf'
logical(1) Passed to normLibSizes. Only used when applying a TMM-type normalisation strategy
Passed to glmTreat glmQLFit when method = "qlf". If method = "lt", instead passed to lmFit
Passed to lfcShrink
A SummarizedExperiment object with results set as the rowData
element.
Any existing columns not contained in the differential ChIP results will be
retained.
Results from testing will contain logCPM, logFC, PValue and any t/F
statistic as appropriate, along with an FDR-adjusted p-value.
If specifying a range-based H0 by setting lfc != 0, an additional column p_mu0 will be included which is the p-value for the point H0: logFC = 0. These are not used for FDR-adjusted p-values but can be helpful when integrating multiple ChIP targets due to the increase in false-negatives when using a range-based H0, and when requiring more accurate identification of truly unchanged sites, as opposed to those which simply fail to achieve significance using a range-based H0 where arbitrary cutoff values are used.
Starting with a SummarizedExperiment object this function fits either a glmQLFit model to count data, performs the nbinomWaldTest on count data, or applies the limma-trend model to logCPM data.
If fitting Generalised Linear Models via glmQLFit, options for normalisation are "none", which normalises to library size. Existing library sizes are commonly found in the "totals" column of the colData element and this is attempted by default. All methods provided in normLibSizes are also implemented, with the added possibility of normalising within groups instead of across the entire dataset. To enable this, the column with the grouping factor is expected to be in the colData element and is simply called by column name. No normalisation is applied when using the limma-trend model, as this allows for previous normalisation strategies to be performed on the data.
If testing with nbinomWaldTest, applying RLE normalisation without groups, and using colSums for library sizes (instead of total alignments), the standard normalisation factors from estimateSizeFactors will be used. In all other scenarios, normalisation factors as returned by normLibSizes will be used. The fitType is set to 'local' when estimating dispersions, and this can be easily modified by passing fitType via the dot arguments. Results are additionally returned after applying lfcShrink, including the svalue returned by this approach.
Normalising to ChIP Input samples is not yet implemented. Similarly, the use of offsets when applying the Wald test is not yet implemented.
Range-based hypothesis testing is implemented using glmTreat or treat. Setting fc to 1 (or lfc to 0) will default to a point-based null hypothesis, equivalent to either glmQLFTest (method = "qlf") or eBayes (method = "lt"). When applying nbinomWaldTest, lfcShrink will be applied.
It should also be noted that this is primarily a convenience function and if requiring intermediate output from any steps, then these can be run individually as conventionally specified.
nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colnames(counts) <- paste0("Sample_", seq_len(ncols))
df <- DataFrame(treat = c("A", "A", "A", "B", "B", "B"))
df$treat <- as.factor(df$treat)
se <- SummarizedExperiment(
assays = SimpleList(counts = counts), colData = df
)
X <- model.matrix(~treat, colData(se))
se <- fitAssayDiff(se, design = X, lib.size = NULL)
#> Creating DGE list...
#> Calculating experiment-wide normalisation factors...
#> Estimating dispersions...
#> Running glmQLFit...
rowData(se)
#> DataFrame with 200 rows and 5 columns
#> logFC logCPM F PValue FDR
#> <numeric> <numeric> <numeric> <numeric> <numeric>
#> 1 4.3090759 11.7021 11.30887207 0.00313808 0.627616
#> 2 0.0833607 12.2407 0.00799283 0.92966253 0.994965
#> 3 1.6791893 11.6748 2.11603063 0.16147713 0.994965
#> 4 -1.1910714 12.0524 1.56369574 0.22574111 0.994965
#> 5 0.5181576 12.0442 0.27527931 0.60565317 0.994965
#> ... ... ... ... ... ...
#> 196 0.0907008 12.7687 0.02288526 0.8812907 0.994965
#> 197 1.0954842 12.3006 1.42572030 0.2466118 0.994965
#> 198 -1.1430187 12.6072 3.15452006 0.0911301 0.994965
#> 199 0.1461608 12.3253 0.03170114 0.8604991 0.994965
#> 200 0.0461835 11.8591 0.00114604 0.9733340 0.994965