Use a set of SNPs, insertions and deletions to modify a reference transcriptome
transmogrify(x, var, exons, ...)
# S4 method for XStringSet,GRanges,GRanges
transmogrify(
x,
var,
exons,
alt_col = "ALT",
trans_col = "transcript_id",
omit_ranges = NULL,
tag = NULL,
sep = "_",
var_tags = FALSE,
var_sep = "_",
verbose = TRUE,
mc.cores = 1,
...
)
# S4 method for BSgenome,GRanges,GRanges
transmogrify(
x,
var,
exons,
alt_col = "ALT",
trans_col = "transcript_id",
omit_ranges = NULL,
tag = NULL,
sep = "_",
var_tags = FALSE,
var_sep = "_",
verbose = TRUE,
mc.cores = 1,
...
)
# S4 method for BSgenome,VcfFile,GRanges
transmogrify(
x,
var,
exons,
alt_col = "ALT",
trans_col = "transcript_id",
omit_ranges = NULL,
tag = NULL,
sep = "_",
var_tags = FALSE,
var_sep = "_",
verbose = TRUE,
mc.cores = 1,
which,
...
)
# S4 method for XStringSet,VcfFile,GRanges
transmogrify(
x,
var,
exons,
alt_col = "ALT",
trans_col = "transcript_id",
omit_ranges = NULL,
tag = NULL,
sep = "_",
var_tags = FALSE,
var_sep = "_",
verbose = TRUE,
mc.cores = 1,
which,
...
)
Reference genome as either a DNAStringSet or BSgenome
GRanges object containing the variants
GRanges object with ranges representing exons
Passed to parallel::mclapply
Column from var
containing alternate bases
Column from 'exons' containing the transcript_id
GRanges object containing ranges to omit, such as PAR-Y regions, for example
Optional tag to add to all sequence names which were modified
Separator to place between seqnames names & tag
logical(1) Add tags indicating which type of variant were incorporated, with 's', 'i' and 'd' representing SNPs, Insertions and Deletions respectively
Separator between any previous tags and variant tags
logical(1) Include informative messages, or operate silently
Number of cores to be used when multi-threading via parallel::mclapply
GRanges object passed to VariantAnnotation::ScanVcfParam if using a VCF directly
An XStringSet
Produce a set of variant modified transcript sequences from a standard reference genome. Supported variants are SNPs, Insertions and Deletions
Ranges needing to be masked, such as the Y-chromosome, or Y-PAR can be provided.
It should be noted that this is a time consuming process Inclusion of a large set of insertions and deletions across an entire transcriptome can involve individually modifying many thousands of transcripts, which can be a computationally demanding task. Whilst this can be parallelised using an appropriate number of cores, this may also prove taxing for lower power laptops, and pre-emptively closing memory hungry programs such as Slack, or internet browers may be prudent.
library(GenomicRanges)
library(GenomicFeatures)
seq <- DNAStringSet(c(chr1 = "ACGTAAATGG"))
exons <- GRanges(c("chr1:1-3:-", "chr1:7-9:-"))
exons$transcript_id <- c("trans1")
# When using extractTranscriptSeqs -stranded exons need to be sorted by end
exons <- sort(exons, decreasing = TRUE, by = ~end)
exons
#> GRanges object with 2 ranges and 1 metadata column:
#> seqnames ranges strand | transcript_id
#> <Rle> <IRanges> <Rle> | <character>
#> [1] chr1 7-9 - | trans1
#> [2] chr1 1-3 - | trans1
#> -------
#> seqinfo: 1 sequence from an unspecified genome; no seqlengths
trByExon <- splitAsList(exons, exons$transcript_id)
# Check the sequences
seq
#> DNAStringSet object of length 1:
#> width seq names
#> [1] 10 ACGTAAATGG chr1
extractTranscriptSeqs(seq, trByExon)
#> DNAStringSet object of length 1:
#> width seq names
#> [1] 6 CATCGT trans1
# Define some variants
var <- GRanges(c("chr1:2", "chr1:8"))
var$ALT <- c("A", "GGG")
# Include the variants adding tags to indicate a SNP and indel
# The exons GRanges object will be split by transcript internally
transmogrify(seq, var, exons, var_tags = TRUE)
#> 1 transcripts found with indels
#> DNAStringSet object of length 1:
#> width seq names
#> [1] 8 CCCCTCTT trans1_si