library(tidyverse)
library(DiagrammeR)
library(pander)
library(yaml)
library(scales)
library(glue)
source(
  here::here("workflow/scripts/plot_dag_functions.R")
)
config <- here::here("config/config.yml") %>%
  read_yaml()
samples <- config$samples$file %>%
  here::here() %>%
  read_tsv()
rep_col <- setdiff(
  colnames(samples),
  c("sample", "target", "treat", "input")
)

Description

This is a standardised workflow for beginning the comparison between two or ChIP targets, using BAM files as the starting point. Treatment groups and targets are specified using config/config.yml.

Workflow

here::here("workflow", "rules", "rulegraph.dot") %>%
  readLines() %>%
  rm_dot_node(node = "\"all\"") %>%
  add_input_node(node = "Alignments\n(Bam Files)", col = "red", ignore = "(download|define|macs2|create|install|update)") %>%
  change_node_colour("(index|macs2|annotations|bedgraph|compile|greylist|coverage)", "red") %>%
  change_node_colour("(create|build|update)", "forestgreen") %>%
  change_node_colour("(download|write|copy|install)", "blue") %>%
  str_replace_all("_", "\n") %>%
  str_replace_all("snakemake\ndag", "snakemake_dag") %>%
  str_replace_all("fontsize=[0-9]+", "fontsize=16") %>%
  str_replace_all("(.+graph.+)(\\];)", "\\1, rankdir = LR\\2") %>%
  grViz()

Summary of processing workflow. The primary data pipeline producing key outputs is shown in red, preparatory steps are shown in blue whilst steps involved in the collation of final output are in green.

Targets

samples %>% 
  group_by(target, treat) %>%
  summarise(n = dplyr::n(), .groups = "drop") %>% 
  rename_all(str_to_title) %>% 
  pander(
    justify = "llr",
    caption = "Summary of supplied ChIP targets and treatment groups"
  )
Summary of supplied ChIP targets and treatment groups
Target Treat N
AR E2 3
AR E2DHT 3
ER E2 3
ER E2DHT 3
H3K27ac E2 3
H3K27ac E2DHT 3

Differential Binding

The specified comparisons are given below, where ‘Vs.’ can be interpreted as a ‘-’ sign. This effectively makes the first listed group as the treatment and the second listed as the reference or control. Differential binding analysis will be performed for every ChIP target where both treatment groups are present.

  • E2DHT Vs. E2

Differential binding will be performed using using smooth-quantile normalisation followed by limma-trend. Changes in signal below 20% will be considered as ‘not of interest’ when performing this analysis.

Pairwise Comparisons

All combinations of the comparisons defined above will will be used to perform pairwise comparisons across ChIP targets and contrasts.

Settings

Genome

External Data

The external data provided was:

  • blacklist: blacklist.bed.gz
  • gtf: gencode.v33lift37.annotation.gtf.gz
  • rnaseq: ZR75_DHT_StrippedSerum_RNASeq_topTable.tsv
  • features: enhancer_atlas_2.0_zr75.gtf.gz
  • coverage: GSM4202320_ZR-75-1_p300_E2.bigwig and GSM4202321_ZR-75-1_p300_E2_DHT.bigwig

Peak Detection

Macs2 callpeak settings were default with the following exceptions:

  • gsize: hs
  • fdr: 0.05
  • keep_duplicates: all

Two additional QC settings were incorporated for peak filtering.

  • For a sample to pass initial QC, the number of identified peaks needs to be < 10-fold (or > 1/10-fold ) the median number of peaks identified in each specific treatment group.
  • Samples with zero peaks were permitted given the possibility of cytoplasmic sequestering of ChIP targets under some conditions
  • For a peak to be included in the set of treatment-specific “Treatment Peaks”, it needs to overlap an identified peak in at least 50% of samples passing the above QC steps

R version 4.2.3 (2023-03-15)

Platform: x86_64-conda-linux-gnu (64-bit)

locale: LC_CTYPE=en_AU.UTF-8, LC_NUMERIC=C, LC_TIME=en_AU.UTF-8, LC_COLLATE=en_AU.UTF-8, LC_MONETARY=en_AU.UTF-8, LC_MESSAGES=en_AU.UTF-8, LC_PAPER=en_AU.UTF-8, LC_NAME=C, LC_ADDRESS=C, LC_TELEPHONE=C, LC_MEASUREMENT=en_AU.UTF-8 and LC_IDENTIFICATION=C

attached base packages: stats, graphics, grDevices, utils, datasets, methods and base

other attached packages: glue(v.1.6.2), scales(v.1.2.1), yaml(v.2.3.7), pander(v.0.6.5), DiagrammeR(v.1.0.10), lubridate(v.1.9.2), forcats(v.1.0.0), stringr(v.1.5.0), dplyr(v.1.1.2), purrr(v.1.0.1), readr(v.2.1.4), tidyr(v.1.3.0), tibble(v.3.2.1), ggplot2(v.3.4.2) and tidyverse(v.2.0.0)

loaded via a namespace (and not attached): tidyselect(v.1.2.0), xfun(v.0.39), bslib(v.0.5.0), colorspace(v.2.1-0), vctrs(v.0.6.3), generics(v.0.1.3), htmltools(v.0.5.5), utf8(v.1.2.3), rlang(v.1.1.1), jquerylib(v.0.1.4), pillar(v.1.9.0), withr(v.2.5.0), bit64(v.4.0.5), RColorBrewer(v.1.1-3), lifecycle(v.1.0.3), munsell(v.0.5.0), gtable(v.0.3.3), visNetwork(v.2.1.2), htmlwidgets(v.1.6.2), evaluate(v.0.21), knitr(v.1.43), tzdb(v.0.4.0), fastmap(v.1.1.1), parallel(v.4.2.3), fansi(v.1.0.4), Rcpp(v.1.0.10), cachem(v.1.0.8), vroom(v.1.6.3), jsonlite(v.1.8.7), bit(v.4.0.5), hms(v.1.1.3), digest(v.0.6.31), stringi(v.1.7.12), grid(v.4.2.3), rprojroot(v.2.0.3), here(v.1.0.1), cli(v.3.6.1), tools(v.4.2.3), magrittr(v.2.0.3), sass(v.0.4.6), crayon(v.1.5.2), pkgconfig(v.2.0.3), ellipsis(v.0.3.2), timechange(v.0.2.0), rmarkdown(v.2.23), rstudioapi(v.0.14), R6(v.2.5.1) and compiler(v.4.2.3)