Retrieve a specific module from a Fastqc* object as a data.frame

# S4 method for class 'FastqcData'
getModule(object, module)

# S4 method for class 'FastqcDataList'
getModule(object, module)

# S4 method for class 'ANY'
getModule(object, module)

# S4 method for class 'FastpData'
getModule(object, module)

# S4 method for class 'FastpDataList'
getModule(object, module)

Arguments

object

Can be a FastqcData, fastqcDataList, or simply a character vector of paths

module

The requested module as contained in a FastQC report. Possible values are Summary, Basic_Statistics, Per_base_sequence_quality, Per_tile_sequence_quality, Per_sequence_quality_scores, Per_base_sequence_content, Per_sequence_GC_content, Per_base_N_content, Sequence_Length_Distribution, Sequence_Duplication_Levels, Overrepresented_sequences, Adapter_Content, Kmer_Content, Total_Deduplicated_Percentage. Note that spelling and capitalisation is exactly as contained within a FastQC report, with the exception that spaces have been converted to underscores. Partial matching is implemented for this argument.

Value

A single tibble containing module-level information from all FastQC reports contained in the Fastqc* object.

Details

This function will return a given module from a Fastqc* object as a data.frame. Note that each module will be it's own unique structure, although all will return a data.frame

Examples

# Get the files included with the package
packageDir <- system.file("extdata", package = "ngsReports")
fl <- list.files(packageDir, pattern = "fastqc.zip", full.names = TRUE)

# Load the FASTQC data as a FastqcDataList object
fdl <- FastqcDataList(fl)

# Extract the Summary module, which corresponds to the PASS/WARN/FAIL flags
getModule(fdl, "Summary")
#> # A tibble: 72 × 3
#>    Filename      Status Category                    
#>    <chr>         <chr>  <chr>                       
#>  1 ATTG_R1.fastq PASS   Basic Statistics            
#>  2 ATTG_R1.fastq FAIL   Per base sequence quality   
#>  3 ATTG_R1.fastq WARN   Per tile sequence quality   
#>  4 ATTG_R1.fastq PASS   Per sequence quality scores 
#>  5 ATTG_R1.fastq FAIL   Per base sequence content   
#>  6 ATTG_R1.fastq FAIL   Per sequence GC content     
#>  7 ATTG_R1.fastq PASS   Per base N content          
#>  8 ATTG_R1.fastq PASS   Sequence Length Distribution
#>  9 ATTG_R1.fastq FAIL   Sequence Duplication Levels 
#> 10 ATTG_R1.fastq FAIL   Overrepresented sequences   
#> # ℹ 62 more rows

# The Basic_Statistics module corresponds to the table at the top of each
# FastQC report
getModule(fdl, "Basic_Statistics")
#> # A tibble: 6 × 8
#>   Filename      Total_Sequences Sequences_flagged_as_poor_qu…¹ Shortest_sequence
#>   <chr>                   <int>                          <int>             <int>
#> 1 ATTG_R1.fastq           24978                              0                84
#> 2 ATTG_R2.fastq           24978                              0                84
#> 3 CCGC_R1.fastq           22269                              0                84
#> 4 CCGC_R2.fastq           22269                              0                84
#> 5 GACC_R1.fastq           10287                              0                84
#> 6 GACC_R2.fastq           10287                              0                84
#> # ℹ abbreviated name: ¹​Sequences_flagged_as_poor_quality
#> # ℹ 4 more variables: Longest_sequence <int>, `%GC` <chr>, File_type <chr>,
#> #   Encoding <chr>