Home

library(DiagrammeR)
library(tidyverse)
library(glue)
library(yaml)
library(here)
library(reactable)
library(pander)

config <- read_yaml(here::here("config/config.yml"))
samples <- read_tsv(here::here(config$samples))

Introduction

This is a simple snakemake workflow for preparing single-end ChIP-Seq libraries. The key steps involved are:

Download files from the Sequence Read Archive
Remove adapter sequences and low quality reads using AdapterRemoval (Schubert, Lindgreen, and Orlando 2016)
Use bowtie2 to align to the genome (Langmead and Salzberg 2012)
De-duplicate reads is markDuplicates
Run macs2 callpeak (Zhang et al. 2008) on
1. All individual samples
2. All samples within a common target and treatment group

Multiple QC reports are created throughout the workflow using FastQC, MultiQC (Ewels et al. 2016) and ngsReports (Ward, To, and Pederson 2020)

A full description of the workflow is available at https://github.com/smped/prepareChIPs, with the rulegraph summarised below.

here::here("workflow", "rules", "rulegraph.dot") %>%
  readLines() %>%
  str_replace_all("(.+graph.+)(\\];)", "\\1, rankdir = LR\\2") %>% 
  str_replace_all("fontsize=10", "fontsize=12") %>% 
  str_replace_all("_", "\n") %>% 
  str_replace_all("snakemake\ndag", "snakemakg_dag") %>%
  (function(x) gsub("(color.+)(style)", "color = \\\"red\\\",\\2", x)) %>%
  (function(x) gsub("(compile|create|qc)(.+)(color.+)(style)", "\\1\\2color = \\\"forestgreen\\\",\\4", x)) %>% 
  (function(x) gsub("(get.+)(color.+)(style)", "\\1color = \\\"royalblue\\\",\\3", x)) %>%
  (function(x) gsub("(\\nfastq.+)(color.+)(style)", "\\1color = \\\"red\\\",\\3", x)) %>%
  grViz()

Snakemake rulegraph for the prepareChIPs workflow. The primary data processing workflow is showin in red, with QC and reporting steps shown in green. Any steps which produce additional, custom statistics are shown in blue.

Provided Samples

The complete set of samples provided is shown in the following table

samples %>% 
  rename_with(function(x) str_replace_all(x, "_", " "), everything()) %>% 
  rename_with(str_to_title, everything()) %>% 
  reactable(
    sortable = TRUE, filterable = TRUE, resizable = TRUE,
    showPageSizeOptions = TRUE
  )

Workflow Parameters

Downloaded files are to be aligned to a genome located in /hpcfs/users/a1018048/refs/gencode-release-33/GRCh37/dna/bt2/ with the prefix GRCh37.primary_assembly.genome

All other parameters were specified to be

adapterremoval:
- extra: –maxns 1 –minlength 50 –minquality 30 –gzip –trimqualities
- adapter1: –adapter1 AGATCGGAAGAGC
bowtie2:
fastqc: –nogroup –noextract
macs2:
- callpeak: -g hs –keep-dup all -q 0.05
- bdgcmp: -m FE
markduplicates: –REMOVE_DUPLICATES true
multiqc:
samtools:
- sort:
- view: -q2
- index:

References

Ewels, Philip, Måns Magnusson, Sverker Lundin, and Max Käller. 2016. “MultiQC: Summarize Analysis Results for Multiple Tools and Samples in a Single Report.” Bioinformatics 32 (19): 3047–48.

Langmead, Ben, and Steven L Salzberg. 2012. “Fast Gapped-Read Alignment with Bowtie 2.” Nat. Methods 9 (4): 357–59.

Schubert, Mikkel, Stinus Lindgreen, and Ludovic Orlando. 2016. “AdapterRemoval V2: Rapid Adapter Trimming, Identification, and Read Merging.” BMC Res. Notes 9 (February): 88.

Ward, Christopher M, Thu-Hien To, and Stephen M Pederson. 2020. “ngsReports: A Bioconductor Package for Managing FastQC Reports and Other NGS Related Log Files.” Bioinformatics 36 (8): 2587–88.

Zhang, Yong, Tao Liu, Clifford A Meyer, Jérôme Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, et al. 2008. “Model-Based Analysis of ChIP-Seq (MACS).” Genome Biol. 9 (9): R137.

R version 4.2.3 (2023-03-15)

Platform: x86_64-conda-linux-gnu (64-bit)

locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: pander(v.0.6.5), reactable(v.0.4.4), here(v.1.0.1), yaml(v.2.3.7), glue(v.1.6.2), lubridate(v.1.9.2), forcats(v.1.0.0), stringr(v.1.5.0), dplyr(v.1.1.2), purrr(v.1.0.1), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), ggplot2(v.3.4.2), tidyverse(v.2.0.0) and DiagrammeR(v.1.0.10)

loaded via a namespace (and not attached): tidyselect(v.1.2.0), xfun(v.0.39), bslib(v.0.5.0), colorspace(v.2.1-0), vctrs(v.0.6.3), generics(v.0.1.3), htmltools(v.0.5.5), utf8(v.1.2.3), rlang(v.1.1.1), jquerylib(v.0.1.4), pillar(v.1.9.0), reactR(v.0.4.4), withr(v.2.5.0), bit64(v.4.0.5), RColorBrewer(v.1.1-3), lifecycle(v.1.0.3), munsell(v.0.5.0), gtable(v.0.3.3), visNetwork(v.2.1.2), htmlwidgets(v.1.6.2), evaluate(v.0.21), knitr(v.1.43), tzdb(v.0.4.0), fastmap(v.1.1.1), crosstalk(v.1.2.0), parallel(v.4.2.3), fansi(v.1.0.4), Rcpp(v.1.0.10), scales(v.1.2.1), cachem(v.1.0.8), vroom(v.1.6.3), jsonlite(v.1.8.5), bit(v.4.0.5), hms(v.1.1.3), digest(v.0.6.31), stringi(v.1.7.12), grid(v.4.2.3), rprojroot(v.2.0.3), cli(v.3.6.1), tools(v.4.2.3), magrittr(v.2.0.3), sass(v.0.4.6), crayon(v.1.5.2), pkgconfig(v.2.0.3), ellipsis(v.0.3.2), timechange(v.0.2.0), rmarkdown(v.2.21), rstudioapi(v.0.14), R6(v.2.5.1) and compiler(v.4.2.3)

Home

12 July, 2023

Introduction

Provided Samples

Workflow Parameters

References