Nextflow Development - Pipeline version control and testing

Objectives

Gain an understanding of how version control is utilised throughout the pipeline
Use nf-core lint to lint the pipeline

7.1 Version control

In every nf-core module, a versions.yml file has been emitted as output. Let’s collect all these files together into one channel ch_versions, which will contain the versions used for every tool in the pipeline. This channel is then saved in the output/pipeline_info folder, and records all the software versions used in the pipeline. Currently, only the FASTQC version has been added to ch_versions

FASTQC:
  fastqc: 0.12.1
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.04.2

For the SALMON_INDEX process, this can be done by using the .out attribute, along with versions. This file is then added to the list of files already present in ch_versions, using the .mix operator.

    SALMON_INDEX ( 
        ch_genome_fasta,
        ch_transcript_fasta
    )

    ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)

In nf-core, the existing softwareVersionsToYAML function will take all .yml files inside ch_versions, creating one large file that traks all software versions.

    //
    // Collate and save software versions
    //
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'customrnaseq_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
        ).set { ch_collated_versions }

This file is saved in the pipeline_info folder of the output directory:

output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml

In addition to tool versions used, the Nextflow version, and pipeline version is also recorded

FASTQC:
  fastqc: 0.12.1
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.10.5

Exercise: Add the versions for SALMON_QUANT and GTF2BED to ch_versions. Rerun the pipeline and check that all software versions have been added to the pipeline.

Solution

To add the version file for SALMON_QUANT to ch_versions, the .out.versions attribute can be used. This is then added to ch_versions using the .mix operator:

    ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)

Similarly for GTF2BED, the following can be added:

    ch_versions = ch_versions.mix(GTF2BED.out.versions)

Rerunning the pipeline and checking the output file:

nextflow run ./nf-core-customrnaseq/main.nf -resume  -profile apptainer --input ./samplesheet.csv --outdir output -params-file ./params.yaml

cat output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml

FASTQC:
  fastqc: 0.12.1
GTF2BED:
  perl: 5.26.2
SALMON_INDEX:
  salmon: 1.10.3
SALMON_QUANT:
  salmon: 1.10.3
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.04.2

As expected, all process versions have now been added to the outout YAML file.

nf-core-customrnaseq/workflows/customrnaseq.nf

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
include { FASTQC                 } from '../modules/nf-core/fastqc/main'
include { MULTIQC                } from '../modules/nf-core/multiqc/main'
include { paramsSummaryMap       } from 'plugin/nf-schema'
include { paramsSummaryMultiqc   } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_customrnaseq_pipeline'

include { SALMON_QUANT                 } from '../modules/nf-core/salmon/quant/main'
include { SALMON_INDEX                } from '../modules/nf-core/salmon/index/main'
include { GTF2BED                              } from '../modules/local/gtf2bed'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    RUN MAIN WORKFLOW
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

workflow CUSTOMRNASEQ {

    take:
    ch_samplesheet // channel: samplesheet read in from --input
    main:

    ch_versions = Channel.empty()
    ch_multiqc_files = Channel.empty()

    ch_genome_fasta = Channel.fromPath(params.fasta)
    ch_transcript_fasta = Channel.fromPath(params.transcript_fasta)

    SALMON_INDEX(ch_genome_fasta, ch_transcript_fasta)
    ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)

    ch_gtf = Channel.fromPath(params.gtf)
    def align_mode = false
    def lib_type = "A"

    SALMON_QUANT(
        ch_samplesheet,
        SALMON_INDEX.out.index,
        ch_gtf,
        ch_transcript_fasta,
        align_mode,
        lib_type
    )
    ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)

    GTF2BED( ch_gtf )
    ch_versions = ch_versions.mix(GTF2BED.out.versions)
    
    //
    // MODULE: Run FastQC
    //
    FASTQC (
        ch_samplesheet
    )
    ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
    ch_versions = ch_versions.mix(FASTQC.out.versions.first())

    //
    // Collate and save software versions
    //
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'customrnaseq_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
        ).set { ch_collated_versions }


    //
    // MODULE: MultiQC
    //
    ch_multiqc_config        = Channel.fromPath(
        "$projectDir/assets/multiqc_config.yml", checkIfExists: true)
    ch_multiqc_custom_config = params.multiqc_config ?
        Channel.fromPath(params.multiqc_config, checkIfExists: true) :
        Channel.empty()
    ch_multiqc_logo          = params.multiqc_logo ?
        Channel.fromPath(params.multiqc_logo, checkIfExists: true) :
        Channel.empty()

    summary_params      = paramsSummaryMap(
        workflow, parameters_schema: "nextflow_schema.json")
    ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params))
    ch_multiqc_files = ch_multiqc_files.mix(
        ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'))
    ch_multiqc_custom_methods_description = params.multiqc_methods_description ?
        file(params.multiqc_methods_description, checkIfExists: true) :
        file("$projectDir/assets/methods_description_template.yml", checkIfExists: true)
    ch_methods_description                = Channel.value(
        methodsDescriptionText(ch_multiqc_custom_methods_description))

    ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions)
    ch_multiqc_files = ch_multiqc_files.mix(
        ch_methods_description.collectFile(
            name: 'methods_description_mqc.yaml',
            sort: true
        )
    )

    MULTIQC (
        ch_multiqc_files.collect(),
        ch_multiqc_config.toList(),
        ch_multiqc_custom_config.toList(),
        ch_multiqc_logo.toList(),
        [],
        []
    )

    emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html
    versions       = ch_versions                 // channel: [ path(versions.yml) ]

}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    THE END
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

8.1 Other resources

8.1.1 Pipeline linting

The nf-core pipelines lint command can be used to check that a given pipeline follow all nf-core community guidelines. This is the same test that is used on the automated continuous integration tests, and is important if you would like to contribute to nf-core.

To contribute your pipeline to nf-core, full documentation is available.

8.1.2 Pipeline test profiles

Another important feature of nf-core pipelines are their test profiles. Pipeline level tests can facilitate more reliable and reproducible pipelines by ensuring identical results are produced at every run. More documentation from nf-core is available here and here

8.1.2 Pushing to GitHub

Currently, we have developed our pipeline locally. However, creating a remote repository can further improve the continuous integration process and streamline work if multiple people are working on the same pipeline. See documentation available here.

^{This workshop is adapted from Fundamentals Training, Advanced Training, Developer Tutorials, Nextflow Patterns materials from Nextflow, nf-core nf-core tools documentation and nf-validation} –>