Black Ochre Data Laboratories, Telethon Kids Institute
March 17, 2023
ngsReports FastQC, cutadapt, STAR, macs2 etc.extraChIPs GRanges objectsGRAVI workflowBut first…

extraChIPs currently up on BioconductorGRanges) are analogous to bed filesGRanges focus on the ranges componentmcols() elementtidyverse?
plyrangesplyrangestibble Coercion## S3 method for class 'DataFrame'
as_tibble(x, rangeAsChar = TRUE, ...)
## S3 method for class 'GenomicRanges'
as_tibble(x, rangeAsChar = TRUE, name = "range", ...)
## S3 method for class 'Seqinfo'
as_tibble(x, ...)
## S3 method for class 'GInteractions'
as_tibble(x, rangeAsChar = TRUE, suffix = c(".x", ".y"), ...)ggplot2 and dplyr etcS4 Compressed list columns well (so far)
vctrs::vec_proxy() to coerce to S3 liststibble CoercionStarting with the protein-coding transcripts for CTLA4
GRanges object with 4 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494-204738688 + | CTLA4 CTLA4-205
[2] chr2 204732666-204737498 + | CTLA4 CTLA4-201
[3] chr2 204732666-204737535 + | CTLA4 CTLA4-204
[4] chr2 204732666-204737535 + | CTLA4 CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
Seqinfo object with 24 sequences from GRCh37 genome:
seqnames seqlengths isCircular genome
chr1 249250621 FALSE GRCh37
chr2 243199373 FALSE GRCh37
chr3 198022430 FALSE GRCh37
chr4 191154276 FALSE GRCh37
chr5 180915260 FALSE GRCh37
... ... ... ...
chr20 63025520 FALSE GRCh37
chr21 48129895 FALSE GRCh37
chr22 51304566 FALSE GRCh37
chrX 155270560 FALSE GRCh37
chrY 59373566 FALSE GRCh37
tibble CoercionNow perform the coercion using as_tibble()
Coerce back to a GRanges uses colToRanges()
GRanges object with 4 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494-204738688 + | CTLA4 CTLA4-205
[2] chr2 204732666-204737498 + | CTLA4 CTLA4-201
[3] chr2 204732666-204737535 + | CTLA4 CTLA4-204
[4] chr2 204732666-204737535 + | CTLA4 CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
GRanges functions just operate on the range
reduce, intersect, setdiff, unionextraChIPs variants also return the mcols element
reduceMC, intersectMC, setdiffMC, unionMCdistinctMC and chopMC replicate tidyverse functionsreduceMC()Say we wish to find the TSS for CTLA transcripts
GRanges object with 4 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494 + | CTLA4 CTLA4-205
[2] chr2 204732666 + | CTLA4 CTLA4-201
[3] chr2 204732666 + | CTLA4 CTLA4-204
[4] chr2 204732666 + | CTLA4 CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
reduceMC()Say we wish to find the TSS for CTLA transcripts
GRanges object with 4 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494 + | CTLA4 CTLA4-205
[2] chr2 204732666 + | CTLA4 CTLA4-201
[3] chr2 204732666 + | CTLA4 CTLA4-204
[4] chr2 204732666 + | CTLA4 CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
reduceMC() returns the mcols as well
GRanges object with 2 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <CharacterList>
[1] chr2 204732494 + | CTLA4 CTLA4-205
[2] chr2 204732666 + | CTLA4 CTLA4-201,CTLA4-204,CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
intersectMC()GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 204732401-204732600 *
-------
seqinfo: 24 sequences from GRCh37 genome
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 204732494-204732600 *
-------
seqinfo: 24 sequences from GRCh37 genome
subsetByOverlaps() would return the entire initial rangemcols from the query range are returnedsetdiffMC()GRanges object with 4 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494-204738688 + | CTLA4 CTLA4-205
[2] chr2 204732666-204737498 + | CTLA4 CTLA4-201
[3] chr2 204732666-204737535 + | CTLA4 CTLA4-204
[4] chr2 204732666-204737535 + | CTLA4 CTLA4-203
-------
seqinfo: 24 sequences from GRCh37 genome
GRanges object with 2 ranges and 2 metadata columns:
seqnames ranges strand | gene_name transcript_name
<Rle> <IRanges> <Rle> | <character> <character>
[1] chr2 204732494-204732665 + | CTLA4 CTLA4-205
[2] chr2 204737536-204738688 + | CTLA4 CTLA4-205
-------
seqinfo: 24 sequences from GRCh37 genome
mcols informationdistinctMC()plyranges::join_*() with multiple matchesGRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | prop_methylated
<Rle> <IRanges> <Rle> | <numeric>
[1] chr2 204732499 * | 0.20
[2] chr2 204732562 * | 0.99
-------
seqinfo: 24 sequences from GRCh37 genome
distinctMC()plyranges::join_*() with multiple matchesGRanges object with 2 ranges and 1 metadata column:
seqnames ranges strand | prop_methylated
<Rle> <IRanges> <Rle> | <numeric>
[1] chr2 204732499 * | 0.20
[2] chr2 204732562 * | 0.99
-------
seqinfo: 24 sequences from GRCh37 genome
peak %>%
plyranges::join_overlap_left(meth) %>%
arrange(1/(prop_methylated)) %>%
distinctMC(.keep_all = TRUE)GRanges object with 1 range and 1 metadata column:
seqnames ranges strand | prop_methylated
<Rle> <IRanges> <Rle> | <numeric>
[1] chr2 204732401-204732600 * | 0.99
-------
seqinfo: 24 sequences from GRCh37 genome
chopMC()tidyr::chop()
vapply() on the column to summariseGRanges object with 1 range and 1 metadata column:
seqnames ranges strand | prop_methylated
<Rle> <IRanges> <Rle> | <NumericList>
[1] chr2 204732401-204732600 * | 0.20,0.99
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths

extraChIPs was written for the GRAVI workflowSummarizedExperiment object with multiple ‘assays’
rowRanges elementGRanges / data.frame-like objectsplotAssay*() for inspection of pre-analysed data
plotAssayDensities(), plotAssayPCA(), plotAssayRle()plotOverlaps(), plotPie() and plotSplitDonut()
Venn/UpSet, Pie or Donut chartsplotProfileHeatmap()
plotHFGC(): plot HiC, Features, Genes, Coverage
plotProfileHeatmap()BigWigFileList
getProfileData(bwfl, gr)ggplot2 for easy customisation
ggside
plotHFCG()GvizGenomicInteractions objectGRanges within a named listGRanges within a named listBigWigFileList objects for multiple tracksGRanges to annotate themplotHFGC()Here I’m checking binding for a DE gene (C1orf116)…
plotPie()plotPie(consensus_peaks, fill = "region") +
scale_fill_manual(values = colours$regions) +
theme(legend.position = "none")AR binding sites mapped to genomic regions
plotSplitDonut()GRanges or data.frame objectpatchwork to drive the layoutplotSplitDonut()plotSplitDonut(
object, inner, outer, scale_by = NULL,
r_centre = 0.5, r_inner = 1, r_outer = 1,
total_size = 5, total_glue = "{comma(N)}",
inner_glue = "{inner} {.data[[inner]]}\n{percent(p,0.1)}",
outer_glue = "{outer} {.data[[outer]]}\n{percent(p,0.1)}",
inner_label = c("label", "text", "none"),
outer_label = c("label", "text", "none"),
label_alpha = 1, inner_label_alpha = NULL, outer_label_alpha = NULL,
label_size = 3, inner_label_size = NULL, outer_label_size = NULL,
min_p = 0.05, inner_min_p = NULL, outer_min_p = NULL,
explode_inner = NULL, explode_outer = NULL, explode_query = c("AND", "OR"),
explode_x = 0, explode_y = 0, explode_r = 0, nudge_r = 0.5,
expand = 0.1,
inner_palette = NULL, outer_palette = NULL,
inner_legend = TRUE, outer_legend = TRUE,
layout = c(main = area(1, 1, 6, 6), lg1 = area(2, 7), lg2 = area(4, 7)), ...
)label & tweak how label size is set globallyplotSplitDonut()target <- "GATA3
merged_results %>%
plotSplitDonut(
inner = target, outer = "Region", min_p = 0.01,
inner_glue = "{.data[[inner]]}\nn = {comma(n, 1)}\n{percent(p, 0.1)}",
outer_glue = "{str_wrap(.data[[outer]], 12)}\nn = {comma(n, 1)}\n{percent(p, 0.1)}",
inner_palette = direction_colours, outer_palette = region_colours
)plotSplitDonut()all_windows %>%
plotSplitDonut(
inner = colnames(.)[[1]], outer = colnames(.)[[2]], min_p =0.02,
inner_glue = "{inner}\n{.data[[inner]]}\n{percent(p, 0.1)}",
outer_glue = "{outer}\n{.data[[outer]]}\n{percent(p, 0.1)}",
explode_inner = "Down|Up", explode_outer = "Down|Up", explode_r = 0.4,
explode_query = "OR", nudge_r = 0.5,
inner_palette = colours$direction, outer_palette = colours$direction
inner_legend = FALSE, outer_legend = FALSE
) plotOverlaps()ComplexUpset::upset() or 2) VennDiagram::draw.*wise.venn()R…ex <- list(x = letters[1:5], y = letters[c(6:15, 26)], z = letters[c(2, 10:25)])
plotOverlaps(ex, type = "upset", set_col = 1:3, labeller = stringr::str_to_title)GRAVI workflowGRAVI is to run multiple ChIP targets
Usually DHT Vs Veh or E2+DHT Vs E2
Can I develop a differential signal workflow for all situations?
log(counts/lib.size) ~ XDESeq2 (just like RNA-Seq)edgeR & choose alternative normalisationedgeR::glmQLF())

Use the maximal signal window?
csaw offers multiple options
extraChIPs: Window Mergingcsaw methods wrapped for slightly different output structureharmonicmeanpextraChIPs: Window MergingextraChIPs: Window MergingA ~3kb H3K27ac region unique to harmonic mean

logCPM
limma-trend (Law et al. 2014) using normalised logCPMextraChIPs)quantro (Stephanie C. Hicks and Irizarry 2015) provides a test for








limma-trend similar to voom
Performed a comparison using
edgeR::glmQLF()edgeR::glmQLF()edgeR::glmQLF()limma-trendlimma-trendPerformed using
All merged results using harmonic mean
treat (McCarthy and Smyth 2009)






treat threshold?
treatvoom or limma-trendprior.counts restricting the range for logCPM?(Did I mention my 100x speed-up?)
