Nextflow Development - Pipeline version control and testing

Objectives
  • Gain an understanding of how version control is utilised throughout the pipeline
  • Use nf-core lint to lint the pipeline

7.1 Version control

In every nf-core module, a versions.yml file has been emitted as output. Let’s collect all these files together into one channel ch_versions, which will contain the versions used for every tool in the pipeline. This channel is then saved in the output/pipeline_info folder, and records all the software versions used in the pipeline. Currently, only the FASTQC version has been added to ch_versions

FASTQC:
  fastqc: 0.12.1
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.04.2

For the SALMON_INDEX process, this can be done by using the .out attribute, along with versions. This file is then added to the list of files already present in ch_versions, using the .mix operator.

    SALMON_INDEX ( 
        ch_genome_fasta,
        ch_transcript_fasta
    )

    ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)

In nf-core, the existing softwareVersionsToYAML function will take all .yml files inside ch_versions, creating one large file that traks all software versions.

    //
    // Collate and save software versions
    //
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'customrnaseq_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
        ).set { ch_collated_versions }

This file is saved in the pipeline_info folder of the output directory:

output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml

In addition to tool versions used, the Nextflow version, and pipeline version is also recorded

FASTQC:
  fastqc: 0.12.1
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.10.5

Exercise: Add the versions for SALMON_QUANT and GTF2BED to ch_versions. Rerun the pipeline and check that all software versions have been added to the pipeline.

To add the version file for SALMON_QUANT to ch_versions, the .out.versions attribute can be used. This is then added to ch_versions using the .mix operator:

    ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)

Similarly for GTF2BED, the following can be added:

    ch_versions = ch_versions.mix(GTF2BED.out.versions)

Rerunning the pipeline and checking the output file:

nextflow run ./nf-core-customrnaseq/main.nf -resume  -profile apptainer --input ./samplesheet.csv --outdir output -params-file ./params.yaml 
cat output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml
FASTQC:
  fastqc: 0.12.1
GTF2BED:
  perl: 5.26.2
SALMON_INDEX:
  salmon: 1.10.3
SALMON_QUANT:
  salmon: 1.10.3
Workflow:
    nf-core/customrnaseq: v1.0.0dev
    Nextflow: 24.04.2

As expected, all process versions have now been added to the outout YAML file.

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
include { FASTQC                 } from '../modules/nf-core/fastqc/main'
include { MULTIQC                } from '../modules/nf-core/multiqc/main'
include { paramsSummaryMap       } from 'plugin/nf-schema'
include { paramsSummaryMultiqc   } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_customrnaseq_pipeline'

include { SALMON_QUANT                 } from '../modules/nf-core/salmon/quant/main'
include { SALMON_INDEX                } from '../modules/nf-core/salmon/index/main'
include { GTF2BED                              } from '../modules/local/gtf2bed'

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    RUN MAIN WORKFLOW
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

workflow CUSTOMRNASEQ {

    take:
    ch_samplesheet // channel: samplesheet read in from --input
    main:

    ch_versions = Channel.empty()
    ch_multiqc_files = Channel.empty()

    ch_genome_fasta = Channel.fromPath(params.fasta)
    ch_transcript_fasta = Channel.fromPath(params.transcript_fasta)

    SALMON_INDEX(ch_genome_fasta, ch_transcript_fasta)
    ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)

    ch_gtf = Channel.fromPath(params.gtf)
    def align_mode = false
    def lib_type = "A"

    SALMON_QUANT(
        ch_samplesheet,
        SALMON_INDEX.out.index,
        ch_gtf,
        ch_transcript_fasta,
        align_mode,
        lib_type
    )
    ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)

    GTF2BED( ch_gtf )
    ch_versions = ch_versions.mix(GTF2BED.out.versions)
    
    //
    // MODULE: Run FastQC
    //
    FASTQC (
        ch_samplesheet
    )
    ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
    ch_versions = ch_versions.mix(FASTQC.out.versions.first())

    //
    // Collate and save software versions
    //
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.outdir}/pipeline_info",
            name: 'nf_core_'  +  'customrnaseq_software_'  + 'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
        ).set { ch_collated_versions }


    //
    // MODULE: MultiQC
    //
    ch_multiqc_config        = Channel.fromPath(
        "$projectDir/assets/multiqc_config.yml", checkIfExists: true)
    ch_multiqc_custom_config = params.multiqc_config ?
        Channel.fromPath(params.multiqc_config, checkIfExists: true) :
        Channel.empty()
    ch_multiqc_logo          = params.multiqc_logo ?
        Channel.fromPath(params.multiqc_logo, checkIfExists: true) :
        Channel.empty()

    summary_params      = paramsSummaryMap(
        workflow, parameters_schema: "nextflow_schema.json")
    ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params))
    ch_multiqc_files = ch_multiqc_files.mix(
        ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'))
    ch_multiqc_custom_methods_description = params.multiqc_methods_description ?
        file(params.multiqc_methods_description, checkIfExists: true) :
        file("$projectDir/assets/methods_description_template.yml", checkIfExists: true)
    ch_methods_description                = Channel.value(
        methodsDescriptionText(ch_multiqc_custom_methods_description))

    ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions)
    ch_multiqc_files = ch_multiqc_files.mix(
        ch_methods_description.collectFile(
            name: 'methods_description_mqc.yaml',
            sort: true
        )
    )

    MULTIQC (
        ch_multiqc_files.collect(),
        ch_multiqc_config.toList(),
        ch_multiqc_custom_config.toList(),
        ch_multiqc_logo.toList(),
        [],
        []
    )

    emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html
    versions       = ch_versions                 // channel: [ path(versions.yml) ]

}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    THE END
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

8.1 Other resources

8.1.1 Pipeline linting

The nf-core pipelines lint command can be used to check that a given pipeline follow all nf-core community guidelines. This is the same test that is used on the automated continuous integration tests, and is important if you would like to contribute to nf-core.

To contribute your pipeline to nf-core, full documentation is available.

8.1.2 Pipeline test profiles

Another important feature of nf-core pipelines are their test profiles. Pipeline level tests can facilitate more reliable and reproducible pipelines by ensuring identical results are produced at every run. More documentation from nf-core is available here and here

8.1.2 Pushing to GitHub

Currently, we have developed our pipeline locally. However, creating a remote repository can further improve the continuous integration process and streamline work if multiple people are working on the same pipeline. See documentation available here.


This workshop is adapted from Fundamentals Training, Advanced Training, Developer Tutorials, Nextflow Patterns materials from Nextflow, nf-core nf-core tools documentation and nf-validation –>