Chapter 3 Quick Start Guide

3.1 Install Snakemake

You will need a snakemake installation to begin. Please see here for help setting this up. If you are running the pipeline on an HPC and are unsure, please consult with your HPC support team about setting up a snakemake profile on your specific cluster.

3.2 Create the Directory Structure

  1. Create a new github repository on your account by going to the github template repository
  2. Download your new repository to your local server or HPC using git clone <myrepository>
  3. Place your bam files in the subdirectory data/bam or data/aligned as described in section 4.2
  4. Edit samples.tsv in the config directory as described in section 4.3
  5. Ensure you have the blacklist as a bed file and annotations as gtf
  6. Modify any parameters in config/config.yml

3.3 Run the Pipeline

3.3.1 Run On A Local Server

To run using 16 cores without any queuing system (e.g. on a local machine), enter the following

snakemake -p --use-conda --notemp --keep-going --rerun-triggers mtime --cores 16

3.3.2 Run On An HPC

Please consult with your local support team for their advice running a snakemake workflow. In essence, the above command will need to be provided to your queuing system through the preferred strategy. The snakemake profile required will generally be stable across all workflows but may require expertise from the technical support team.

3.4 Tips And Tricks

3.4.1 Removing Large files

Some large files, such as R Environments and BedGraph files are marked as temp files internally and these can be removed after completion of the workflow using

snakemake --delete-temp-output --cores 1

3.4.2 Shared Conda Environments

conda environments can easily become bloated and if running multiple GRAVI analyses it may be simpler to host a common set of conda environments to avoid their constant recreation. This can be performed by adding the argument --conda-prefix '/path/to/my/envs/' in the call to snakemake

Conda environments can also be built prior to running the workflow using the standalone command

snakemake \
    --use-conda \
    --conda-prefix '/path/to/my/envs/' \
    --conda-create-envs-only \
    --cores 1

3.4.3 Running Restricted Sections of the Worklow

Snakemake has the capacity to run a workflow up until a certain point and this can be easily done using the argument --until and specifying the stage you wish to terminate the workflow at. For example, the argument --until compile_macs2_summary_html would only run the workflow until the macs2 summaries are compiled, which may be preferable for checking QC before proceeding to differential expression and pairwise comparisons.