Skip to contents

Use a set of SNPs, insertions and deletions to modify a reference transcriptome

Usage

transmogrify(x, var, exons, ...)

# S4 method for class 'XStringSet,GRanges,GRanges'
transmogrify(
  x,
  var,
  exons,
  alt_col = "ALT",
  trans_col = "transcript_id",
  omit_ranges = NULL,
  tag = NULL,
  sep = "_",
  var_tags = FALSE,
  var_sep = "_",
  ol_vars = "fail",
  verbose = TRUE,
  mc.cores = 1,
  ...
)

# S4 method for class 'BSgenome,GRanges,GRanges'
transmogrify(
  x,
  var,
  exons,
  alt_col = "ALT",
  trans_col = "transcript_id",
  omit_ranges = NULL,
  tag = NULL,
  sep = "_",
  var_tags = FALSE,
  var_sep = "_",
  ol_vars = "fail",
  verbose = TRUE,
  mc.cores = 1,
  ...
)

# S4 method for class 'BSgenome,VcfFile,GRanges'
transmogrify(
  x,
  var,
  exons,
  alt_col = "ALT",
  trans_col = "transcript_id",
  omit_ranges = NULL,
  tag = NULL,
  sep = "_",
  var_tags = FALSE,
  var_sep = "_",
  ol_vars = "fail",
  verbose = TRUE,
  mc.cores = 1,
  which,
  ...
)

# S4 method for class 'XStringSet,VcfFile,GRanges'
transmogrify(
  x,
  var,
  exons,
  alt_col = "ALT",
  trans_col = "transcript_id",
  omit_ranges = NULL,
  tag = NULL,
  sep = "_",
  var_tags = FALSE,
  var_sep = "_",
  ol_vars = "fail",
  verbose = TRUE,
  mc.cores = 1,
  which,
  ...
)

Arguments

x

Reference genome as either a DNAStringSet or BSgenome

var

GRanges object containing the variants

exons

GRanges object with ranges representing exons

...

Passed to parallel::mclapply

alt_col

Column from var containing alternate bases

trans_col

Column from 'exons' containing the transcript_id

omit_ranges

GRanges object containing ranges to omit, such as PAR-Y regions, for example

tag

Optional tag to add to all sequence names which were modified

sep

Separator to place between seqnames names & tag

var_tags

logical(1) Add tags indicating which type of variant were incorporated, with 's', 'i' and 'd' representing SNPs, Insertions and Deletions respectively

var_sep

Separator between any previous tags and variant tags

ol_vars

Error handling for any overlapping variants. See cleanVariants for possible values and an explanation

verbose

logical(1) Include informative messages, or operate silently

mc.cores

Number of cores to be used when multi-threading via parallel::mclapply

which

GRanges object passed to VariantAnnotation::ScanVcfParam if using a VCF directly

Value

An XStringSet

Details

Produce a set of variant modified transcript sequences from a standard reference genome. Supported variants are SNPs, Insertions and Deletions

Ranges needing to be masked, such as the Y-chromosome, or Y-PAR can be provided.

It should be noted that this is a time consuming process Inclusion of a large set of insertions and deletions across an entire transcriptome can involve individually modifying many thousands of transcripts, which can be a computationally demanding task. Whilst this can be parallelised using an appropriate number of cores, this may also prove taxing for lower power laptops, and pre-emptively closing memory hungry programs such as Slack, or internet browers may be prudent.

Examples

library(GenomicRanges)
library(GenomicFeatures)
seq <- DNAStringSet(c(chr1 = "ACGTAAATGG"))
exons <- GRanges(c("chr1:1-3:-", "chr1:7-9:-"))
exons$transcript_id <- c("trans1")

# When using extractTranscriptSeqs -stranded exons need to be sorted by end
exons <- sort(exons, decreasing = TRUE, by = ~end)
exons
#> GRanges object with 2 ranges and 1 metadata column:
#>       seqnames    ranges strand | transcript_id
#>          <Rle> <IRanges>  <Rle> |   <character>
#>   [1]     chr1       7-9      - |        trans1
#>   [2]     chr1       1-3      - |        trans1
#>   -------
#>   seqinfo: 1 sequence from an unspecified genome; no seqlengths
trByExon <- splitAsList(exons, exons$transcript_id)

# Check the sequences
seq
#> DNAStringSet object of length 1:
#>     width seq                                               names               
#> [1]    10 ACGTAAATGG                                        chr1
extractTranscriptSeqs(seq, trByExon)
#> DNAStringSet object of length 1:
#>     width seq                                               names               
#> [1]     6 CATCGT                                            trans1

# Define some variants
var <- GRanges(c("chr1:2", "chr1:8"))
var$ALT <- c("A", "GGG")

# Include the variants adding tags to indicate a SNP and indel
# The exons GRanges object will be split by transcript internally
transmogrify(seq, var, exons, var_tags = TRUE)
#> 1 transcripts found with indels
#> DNAStringSet object of length 1:
#>     width seq                                               names               
#> [1]     8 CCCCTCTT                                          trans1_si