Apply two filters to counts generated using sliding windows
dualFilter(
x,
bg = NULL,
ref,
q = 0.5,
logCPM = TRUE,
keep.totals = TRUE,
bin.size = NULL,
prior.count = 2,
BPPARAM = bpparam()
)
RangedSummarizedExperiment containing sample counts
RangedSummarizedExperiment containing background/input counts, or alternate method for selecting samples from within x, such as a logical, numeric or character vector
GRanges object containing ranges where signal is expected
The upper percentile of the reference ranges expected to be returned when tuning the filtering criteria
logical(1) Add a logCPM assay to the returned data
logical(1) Keep the original library sizes or replace using only the retained windows
Bin sizes when calling filterWindowsControl. If not specified will default to the largest of 2000bp or 10x the window size
Passed to filterWindowsControl and filterWindowsProportion
Settings for running in parallel
A RangedSummarizedExperiment which is a
filtered subset of the original object. If requested the assay "logCPM" will
be added (TRUE
by default)
This function will take sliding (or tiling) windows for it's input as a RangedSummarizedExperiment object. The dual strategy of applying filterWindowsControl and filterWindowsProportion will then be applied. A set of reference ranges for which signal is expected is used to refine the filtering criteria.
Cutoff values are found for both signal relative to input and overall signal,
such that the 100*q%
of the (sliding) windows which overlap a reference
range will be returned, along with any others which match the
dual filtering criteria.
In general, higher values of q
will return more windows as those with weak
signal and a marginal overlap with a reference range will be returned.
Lower values will ensure that fewer windows, generally with the strongest
signal, are retained.
Cutoff values for both criteria are added to the metadata
element of the returned object.
If setting bg = NULL
the filterWindowsControl step will be
ignored and only the filterWindowsProportion will be used.
This should only be performed if no Input sample is available.
Please note that the any .bam
files referred to in the supplied objects
must be accessible to this function. It will not run on a separate
machine or file structure to that which the original sliding windows were
prepared. Please see the example/vignette for runnable code.
# \donttest{
## Taken from the differential_binding vignette
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ lubridate::%within%() masks IRanges::%within%()
#> ✖ ggplot2::Position() masks BiocGenerics::Position(), base::Position()
#> ✖ dplyr::collapse() masks IRanges::collapse()
#> ✖ dplyr::combine() masks Biobase::combine(), BiocGenerics::combine()
#> ✖ dplyr::count() masks matrixStats::count()
#> ✖ dplyr::desc() masks IRanges::desc()
#> ✖ tidyr::expand() masks S4Vectors::expand()
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::first() masks S4Vectors::first()
#> ✖ dplyr::lag() masks stats::lag()
#> ✖ purrr::reduce() masks GenomicRanges::reduce(), IRanges::reduce()
#> ✖ dplyr::rename() masks S4Vectors::rename()
#> ✖ lubridate::second() masks S4Vectors::second()
#> ✖ lubridate::second<-() masks S4Vectors::second<-()
#> ✖ dplyr::slice() masks IRanges::slice()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(Rsamtools)
#> Loading required package: Biostrings
#> Loading required package: XVector
#>
#> Attaching package: 'XVector'
#> The following object is masked from 'package:purrr':
#>
#> compact
#>
#> Attaching package: 'Biostrings'
#> The following object is masked from 'package:base':
#>
#> strsplit
library(csaw)
library(BiocParallel)
library(rtracklayer)
## For this function we need a set of counts using sliding windows and the
## original BamFiles from which they were taken
## First we'll set up the bam file list
bfl <- system.file(
"extdata", "bam", c("ex1.bam", "ex2.bam", "input.bam"), package = "extraChIPs"
) %>%
BamFileList() %>%
setNames(c("ex1", "ex2", "input"))
## Then define the readParam settings for csaw::readParam()
rp <- readParam(
pe = "none",
dedup = TRUE,
restrict = "chr10"
)
## Now we can form our sliding window object with the counts.
wincounts <- windowCounts(
bam.files = bfl,
spacing = 60,
width = 180,
ext = 200,
filter = 1,
param = rp
)
## As this is a subset of reads, add the initial library sizes for accuracy
## Note that this step is not normally required
wincounts$totals <- c(964076L, 989543L, 1172179L)
## We should also update the metadata for our counts
wincounts$sample <- colnames(wincounts)
wincounts$treat <- as.factor(c("ctrl", "treat", NA))
colData(wincounts)
#> DataFrame with 3 rows and 6 columns
#> bam.files totals ext rlen sample treat
#> <character> <integer> <integer> <integer> <character> <factor>
#> ex1 /__w/_temp/Library/e.. 964076 200 73 ex1 ctrl
#> ex2 /__w/_temp/Library/e.. 989543 200 74 ex2 treat
#> input /__w/_temp/Library/e.. 1172179 200 74 input NA
## The function dualFilter requires a set of peaks which will guide the
## filtering step. This indicate where genuine signal is likely to be found
## and will perform the filtering based on a) signal above the input, and
## b) The overall signal level, using the guide set of peaks to inform the
## cutoff values for inclusion
peaks <- import.bed(
system.file("extdata", "peaks.bed.gz", package = "extraChIPs")
)
filtcounts <- dualFilter(
x = wincounts, bg = "input", ref = peaks,
q = 0.8 # Better to use q = 0.5 on real data
)
filtcounts
#> class: RangedSummarizedExperiment
#> dim: 101 2
#> metadata(7): spacing width ... final.ext cuts
#> assays(2): counts logCPM
#> rownames: NULL
#> rowData names(1): overlaps_ref
#> colnames(2): ex1 ex2
#> colData names(6): bam.files totals ... sample treat
# }