Skip to contents

Parse transcript counts and additional data from salmon

Usage

digestSalmon(
  paths,
  max_sets = 2L,
  aux_dir = "aux_info",
  name_fun = basename,
  verbose = TRUE,
  length_as_assay = FALSE,
  ...
)

Arguments

paths

Vector of file paths to directories containing salmon results

max_sets

The maximum number of indexes permitted

aux_dir

Subdirectory where bootstraps and meta_info.json are stored

name_fun

Function applied to paths to provide colnames in the returned object. Set to NULL or c() to disable.

verbose

Print progress messages

length_as_assay

Output transcript lengths as an assay. May be required if using separate reference transcriptomes for different samples

...

Not used

Value

A SummarizedExperiment object containing assays for counts, scaledCounts, TPM and effectiveLength. The scaledCounts assay contains counts divided by overdispersions. rowData in the returned object will also include transcript-lengths along with the overdispersion estimates used to return the scaled counts.

Details

This function is based heavily on edgeR::catchSalmon() with some important exceptions:

  1. A SummarizedExperiment object is returned

  2. Differing numbers of transcripts are allowed between samples

The second point is intended for the scenario where some samples may have been aligned to a full reference, with remaining samples aligned to a partially masked reference (e.g. chrY). This will lead to differing numbers of transcripts within each salmon index, however, common estimates of overdispersions are required for scaling transcript-level counts. By default, the function will error if >2 different sets of transcripts are detected, however this can be modified using the max_sets argument.

The SummarizedExperiment object returned will also contain multiple assays, as described below