Nextflow Development - Pipeline version control and testing
- Gain an understanding of how version control is utilised throughout the pipeline
- Use
nf-core lint
to lint the pipeline
7.1 Version control
In every nf-core module, a versions.yml
file has been emitted as output. Let’s collect all these files together into one channel ch_versions
, which will contain the versions used for every tool in the pipeline. This channel is then saved in the output/pipeline_info
folder, and records all the software versions used in the pipeline. Currently, only the FASTQC
version has been added to ch_versions
FASTQC:
fastqc: 0.12.1
Workflow:
nf-core/customrnaseq: v1.0.0dev Nextflow: 24.04.2
For the SALMON_INDEX
process, this can be done by using the .out
attribute, along with versions
. This file is then added to the list of files already present in ch_versions
, using the .mix
operator.
SALMON_INDEX (
ch_genome_fasta,
ch_transcript_fasta
)
ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)
In nf-core, the existing softwareVersionsToYAML
function will take all .yml
files inside ch_versions
, creating one large file that traks all software versions.
//
// Collate and save software versions
//
softwareVersionsToYAML(ch_versions)
.collectFile(
storeDir: "${params.outdir}/pipeline_info",
name: 'nf_core_' + 'customrnaseq_software_' + 'mqc_' + 'versions.yml',
sort: true,
newLine: true ).set { ch_collated_versions }
This file is saved in the pipeline_info
folder of the output directory:
output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml
In addition to tool versions used, the Nextflow version, and pipeline version is also recorded
FASTQC:
fastqc: 0.12.1
Workflow:
nf-core/customrnaseq: v1.0.0dev Nextflow: 24.10.5
Exercise: Add the versions for SALMON_QUANT
and GTF2BED
to ch_versions
. Rerun the pipeline and check that all software versions have been added to the pipeline.
To add the version file for SALMON_QUANT
to ch_versions
, the .out.versions
attribute can be used. This is then added to ch_versions
using the .mix
operator:
ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)
Similarly for GTF2BED
, the following can be added:
ch_versions = ch_versions.mix(GTF2BED.out.versions)
Rerunning the pipeline and checking the output file:
nextflow run ./nf-core-customrnaseq/main.nf -resume -profile apptainer --input ./samplesheet.csv --outdir output -params-file ./params.yaml
cat output/pipeline_info/nf_core_customrnaseq_software_mqc_versions.yml
FASTQC:
fastqc: 0.12.1
GTF2BED:
perl: 5.26.2
SALMON_INDEX:
salmon: 1.10.3
SALMON_QUANT:
salmon: 1.10.3
Workflow:
nf-core/customrnaseq: v1.0.0dev Nextflow: 24.04.2
As expected, all process versions have now been added to the outout YAML file.
nf-core-customrnaseq/workflows/customrnaseq.nf
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
IMPORT MODULES / SUBWORKFLOWS / FUNCTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
include { FASTQC } from '../modules/nf-core/fastqc/main'
include { MULTIQC } from '../modules/nf-core/multiqc/main'
include { paramsSummaryMap } from 'plugin/nf-schema'
include { paramsSummaryMultiqc } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { softwareVersionsToYAML } from '../subworkflows/nf-core/utils_nfcore_pipeline'
include { methodsDescriptionText } from '../subworkflows/local/utils_nfcore_customrnaseq_pipeline'
include { SALMON_QUANT } from '../modules/nf-core/salmon/quant/main'
include { SALMON_INDEX } from '../modules/nf-core/salmon/index/main'
include { GTF2BED } from '../modules/local/gtf2bed'
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RUN MAIN WORKFLOW
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
workflow CUSTOMRNASEQ {
take:
ch_samplesheet // channel: samplesheet read in from --input
main:
ch_versions = Channel.empty()
ch_multiqc_files = Channel.empty()
ch_genome_fasta = Channel.fromPath(params.fasta)
ch_transcript_fasta = Channel.fromPath(params.transcript_fasta)
SALMON_INDEX(ch_genome_fasta, ch_transcript_fasta)
ch_versions = ch_versions.mix(SALMON_INDEX.out.versions)
ch_gtf = Channel.fromPath(params.gtf)
def align_mode = false
def lib_type = "A"
SALMON_QUANT(
ch_samplesheet,
SALMON_INDEX.out.index,
ch_gtf,
ch_transcript_fasta,
align_mode,
lib_type
)
ch_versions = ch_versions.mix(SALMON_QUANT.out.versions)
GTF2BED( ch_gtf )
ch_versions = ch_versions.mix(GTF2BED.out.versions)
//
// MODULE: Run FastQC
//
FASTQC (
ch_samplesheet
)
ch_multiqc_files = ch_multiqc_files.mix(FASTQC.out.zip.collect{it[1]})
ch_versions = ch_versions.mix(FASTQC.out.versions.first())
//
// Collate and save software versions
//
softwareVersionsToYAML(ch_versions)
.collectFile(
storeDir: "${params.outdir}/pipeline_info",
name: 'nf_core_' + 'customrnaseq_software_' + 'mqc_' + 'versions.yml',
sort: true,
newLine: true
).set { ch_collated_versions }
//
// MODULE: MultiQC
//
ch_multiqc_config = Channel.fromPath(
"$projectDir/assets/multiqc_config.yml", checkIfExists: true)
ch_multiqc_custom_config = params.multiqc_config ?
Channel.fromPath(params.multiqc_config, checkIfExists: true) :
Channel.empty()
ch_multiqc_logo = params.multiqc_logo ?
Channel.fromPath(params.multiqc_logo, checkIfExists: true) :
Channel.empty()
summary_params = paramsSummaryMap(
workflow, parameters_schema: "nextflow_schema.json")
ch_workflow_summary = Channel.value(paramsSummaryMultiqc(summary_params))
ch_multiqc_files = ch_multiqc_files.mix(
ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml'))
ch_multiqc_custom_methods_description = params.multiqc_methods_description ?
file(params.multiqc_methods_description, checkIfExists: true) :
file("$projectDir/assets/methods_description_template.yml", checkIfExists: true)
ch_methods_description = Channel.value(
methodsDescriptionText(ch_multiqc_custom_methods_description))
ch_multiqc_files = ch_multiqc_files.mix(ch_collated_versions)
ch_multiqc_files = ch_multiqc_files.mix(
ch_methods_description.collectFile(
name: 'methods_description_mqc.yaml',
sort: true
)
)
MULTIQC (
ch_multiqc_files.collect(),
ch_multiqc_config.toList(),
ch_multiqc_custom_config.toList(),
ch_multiqc_logo.toList(),
[],
[]
)
emit:multiqc_report = MULTIQC.out.report.toList() // channel: /path/to/multiqc_report.html
versions = ch_versions // channel: [ path(versions.yml) ]
}
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
THE END
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ */
8.1 Other resources
8.1.1 Pipeline linting
The nf-core pipelines lint
command can be used to check that a given pipeline follow all nf-core community guidelines. This is the same test that is used on the automated continuous integration tests, and is important if you would like to contribute to nf-core
.
To contribute your pipeline to nf-core, full documentation is available.
8.1.2 Pipeline test profiles
Another important feature of nf-core pipelines are their test
profiles. Pipeline level tests can facilitate more reliable and reproducible pipelines by ensuring identical results are produced at every run. More documentation from nf-core is available here and here
8.1.2 Pushing to GitHub
Currently, we have developed our pipeline locally. However, creating a remote repository can further improve the continuous integration process and streamline work if multiple people are working on the same pipeline. See documentation available here.
This workshop is adapted from Fundamentals Training, Advanced Training, Developer Tutorials, Nextflow Patterns materials from Nextflow, nf-core nf-core tools documentation and nf-validation –>