Parse transcript counts and additional data from salmon
Usage
digestSalmon(
paths,
max_sets = 2L,
aux_dir = "aux_info",
name_fun = basename,
verbose = TRUE,
length_as_assay = FALSE,
...
)
Arguments
- paths
Vector of file paths to directories containing salmon results
- max_sets
The maximum number of indexes permitted
- aux_dir
Subdirectory where bootstraps and meta_info.json are stored
- name_fun
Function applied to paths to provide colnames in the returned object. Set to NULL or c() to disable.
- verbose
Print progress messages
- length_as_assay
Output transcript lengths as an assay. May be required if using separate reference transcriptomes for different samples
- ...
Not used
Value
A SummarizedExperiment object containing assays for counts, scaledCounts, TPM and effectiveLength. The scaledCounts assay contains counts divided by overdispersions. rowData in the returned object will also include transcript-lengths along with the overdispersion estimates used to return the scaled counts.
Details
This function is based heavily on edgeR::catchSalmon()
with some important
exceptions:
A SummarizedExperiment object is returned
Differing numbers of transcripts are allowed between samples
The second point is intended for the scenario where some samples may have been aligned to a full reference, with remaining samples aligned to a partially masked reference (e.g. chrY). This will lead to differing numbers of transcripts within each salmon index, however, common estimates of overdispersions are required for scaling transcript-level counts. By default, the function will error if >2 different sets of transcripts are detected, however this can be modified using the max_sets argument.
The SummarizedExperiment object returned will also contain multiple assays, as described below