library(DiagrammeR)
library(tidyverse)
library(glue)
library(yaml)
library(here)
library(reactable)
library(pander)
config <- read_yaml(here::here("config/config.yml"))
samples <- read_tsv(here::here(config$samples))
This is a simple snakemake
workflow for preparing
single-end ChIP-Seq libraries. The key steps involved are:
bowtie2
to align to the genome (Langmead and Salzberg 2012)markDuplicates
macs2 callpeak
(Zhang et al.
2008) on
Multiple QC reports are created throughout the workflow using FastQC, MultiQC (Ewels et al. 2016) and ngsReports (Ward, To, and Pederson 2020)
A full description of the workflow is available at https://github.com/smped/prepareChIPs, with the rulegraph summarised below.
here::here("workflow", "rules", "rulegraph.dot") %>%
readLines() %>%
str_replace_all("(.+graph.+)(\\];)", "\\1, rankdir = LR\\2") %>%
str_replace_all("fontsize=10", "fontsize=12") %>%
str_replace_all("_", "\n") %>%
str_replace_all("snakemake\ndag", "snakemakg_dag") %>%
(function(x) gsub("(color.+)(style)", "color = \\\"red\\\",\\2", x)) %>%
(function(x) gsub("(compile|create|qc)(.+)(color.+)(style)", "\\1\\2color = \\\"forestgreen\\\",\\4", x)) %>%
(function(x) gsub("(get.+)(color.+)(style)", "\\1color = \\\"royalblue\\\",\\3", x)) %>%
(function(x) gsub("(\\nfastq.+)(color.+)(style)", "\\1color = \\\"red\\\",\\3", x)) %>%
grViz()
The complete set of samples provided is shown in the following table
samples %>%
rename_with(function(x) str_replace_all(x, "_", " "), everything()) %>%
rename_with(str_to_title, everything()) %>%
reactable(
sortable = TRUE, filterable = TRUE, resizable = TRUE,
showPageSizeOptions = TRUE
)
Downloaded files are to be aligned to a genome located in /hpcfs/users/a1018048/refs/gencode-release-33/GRCh37/dna/bt2/ with the prefix GRCh37.primary_assembly.genome
All other parameters were specified to be
adapterremoval:
bowtie2:
fastqc: –nogroup –noextract
macs2:
markduplicates: –REMOVE_DUPLICATES true
multiqc:
samtools:
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C
attached base packages: stats, graphics, grDevices, utils, datasets, methods and base
other attached packages: pander(v.0.6.5), reactable(v.0.4.4), here(v.1.0.1), yaml(v.2.3.7), glue(v.1.6.2), lubridate(v.1.9.2), forcats(v.1.0.0), stringr(v.1.5.0), dplyr(v.1.1.2), purrr(v.1.0.1), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), ggplot2(v.3.4.2), tidyverse(v.2.0.0) and DiagrammeR(v.1.0.10)
loaded via a namespace (and not attached): tidyselect(v.1.2.0), xfun(v.0.39), bslib(v.0.5.0), colorspace(v.2.1-0), vctrs(v.0.6.3), generics(v.0.1.3), htmltools(v.0.5.5), utf8(v.1.2.3), rlang(v.1.1.1), jquerylib(v.0.1.4), pillar(v.1.9.0), reactR(v.0.4.4), withr(v.2.5.0), bit64(v.4.0.5), RColorBrewer(v.1.1-3), lifecycle(v.1.0.3), munsell(v.0.5.0), gtable(v.0.3.3), visNetwork(v.2.1.2), htmlwidgets(v.1.6.2), evaluate(v.0.21), knitr(v.1.43), tzdb(v.0.4.0), fastmap(v.1.1.1), crosstalk(v.1.2.0), parallel(v.4.2.3), fansi(v.1.0.4), Rcpp(v.1.0.10), scales(v.1.2.1), cachem(v.1.0.8), vroom(v.1.6.3), jsonlite(v.1.8.5), bit(v.4.0.5), hms(v.1.1.3), digest(v.0.6.31), stringi(v.1.7.12), grid(v.4.2.3), rprojroot(v.2.0.3), cli(v.3.6.1), tools(v.4.2.3), magrittr(v.2.0.3), sass(v.0.4.6), crayon(v.1.5.2), pkgconfig(v.2.0.3), ellipsis(v.0.3.2), timechange(v.0.2.0), rmarkdown(v.2.21), rstudioapi(v.0.14), R6(v.2.5.1) and compiler(v.4.2.3)