Retrieve a specific module from a Fastqc* object as a data.frame
# S4 method for class 'FastqcData'
getModule(object, module)
# S4 method for class 'FastqcDataList'
getModule(object, module)
# S4 method for class 'ANY'
getModule(object, module)
# S4 method for class 'FastpData'
getModule(object, module)
# S4 method for class 'FastpDataList'
getModule(object, module)
Can be a FastqcData
, fastqcDataList
, or simply
a character
vector of paths
The requested module as contained in a FastQC report. Possible
values are Summary
, Basic_Statistics
,
Per_base_sequence_quality
, Per_tile_sequence_quality
,
Per_sequence_quality_scores
, Per_base_sequence_content
,
Per_sequence_GC_content
, Per_base_N_content
,
Sequence_Length_Distribution
, Sequence_Duplication_Levels
,
Overrepresented_sequences
, Adapter_Content
,
Kmer_Content
, Total_Deduplicated_Percentage
.
Note that spelling and capitalisation is exactly as contained within a
FastQC report, with the exception that spaces have been converted to
underscores. Partial matching is implemented for this argument.
A single tibble
containing module-level information
from all FastQC reports contained in the Fastqc* object.
This function will return a given module from a Fastqc* object as a data.frame. Note that each module will be it's own unique structure, although all will return a data.frame
# Get the files included with the package
packageDir <- system.file("extdata", package = "ngsReports")
fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE)
# Load the FASTQC data as a FastqcDataList object
fdl <- FastqcDataList(fl)
# Extract the Summary module, which corresponds to the PASS/WARN/FAIL flags
getModule(fdl, "Summary")
#> # A tibble: 72 × 3
#> Filename Status Category
#> <chr> <chr> <chr>
#> 1 ATTG_R1.fastq PASS Basic Statistics
#> 2 ATTG_R1.fastq FAIL Per base sequence quality
#> 3 ATTG_R1.fastq WARN Per tile sequence quality
#> 4 ATTG_R1.fastq PASS Per sequence quality scores
#> 5 ATTG_R1.fastq FAIL Per base sequence content
#> 6 ATTG_R1.fastq FAIL Per sequence GC content
#> 7 ATTG_R1.fastq PASS Per base N content
#> 8 ATTG_R1.fastq PASS Sequence Length Distribution
#> 9 ATTG_R1.fastq FAIL Sequence Duplication Levels
#> 10 ATTG_R1.fastq FAIL Overrepresented sequences
#> # ℹ 62 more rows
# The Basic_Statistics module corresponds to the table at the top of each
# FastQC report
getModule(fdl, "Basic_Statistics")
#> # A tibble: 6 × 8
#> Filename Total_Sequences Sequences_flagged_as_poor_qu…¹ Shortest_sequence
#> <chr> <int> <int> <int>
#> 1 ATTG_R1.fastq 24978 0 84
#> 2 ATTG_R2.fastq 24978 0 84
#> 3 CCGC_R1.fastq 22269 0 84
#> 4 CCGC_R2.fastq 22269 0 84
#> 5 GACC_R1.fastq 10287 0 84
#> 6 GACC_R2.fastq 10287 0 84
#> # ℹ abbreviated name: ¹Sequences_flagged_as_poor_quality
#> # ℹ 4 more variables: Longest_sequence <int>, `%GC` <chr>, File_type <chr>,
#> # Encoding <chr>