Troubleshooting Nextflow run

Objectives
  • Learn basic troubleshooting of nextflow log
  • Learn the structure of nextflow work directory
  • Examine the run command stitched together by nextflow for manual debugging

2.2.1. Nextflow log

It is important to keep a record of the commands you have used to generate your results. Nextflow helps with this by creating and storing metadata and logs about the run in hidden files and folders in your current directory (unless otherwise specified). This data can be used by Nextflow to generate reports. It can also be queried using the Nextflow log command:

nextflow log

The log command has multiple options to facilitate the queries and is especially useful while debugging a workflow and inspecting execution metadata. You can view all of the possible log options with -h flag:

nextflow log -h

To query a specific execution you can use the RUN NAME or a SESSION ID:

nextflow log <run name>

To get more information, you can use the -f option with named fields. For example:

nextflow log <run name> -f process,hash,duration

There are many other fields you can query. You can view a full list of fields with the -l option:

nextflow log -l

Exercise: Use the log command to view with process, hash, and script fields for your tasks from your most recent Nextflow execution.

First, use the log command to get a list of your recent executions:

nextflow log
TIMESTAMP           DURATION    RUN NAME            STATUS  REVISION ID SESSION ID                              COMMAND                                                                                                                 
2025-05-27 15:23:39 2m 16s      hopeful_dubinsky    OK      b89fac3265  54c06115-6867-45e3-86b3-8566a69f7406    nextflow run nf-core/rnaseq -r 3.14.0 -profile apptainer -params-file ./workshop-params.yaml                            
2025-05-27 15:56:36 37.1s       tiny_bartik         OK      b89fac3265  54c06115-6867-45e3-86b3-8566a69f7406    nextflow run nf-core/rnaseq -r 3.14.0 -profile apptainer -params-file ./workshop-params.yaml -resume                    
2025-05-27 15:57:23 35.6s       angry_mcclintock    OK      b89fac3265  54c06115-6867-45e3-86b3-8566a69f7406    nextflow run nf-core/rnaseq -r 3.14.0 -profile apptainer -params-file ./workshop-params.yaml -resume --outdir my_results

Query the process, hash, and script using the -f option for the most recent run:

nextflow log marvelous_shannon -f process,hash,script

[... truncated ...]

NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT d7/01e251   
    salmon quant \
        --geneMap genome.filtered.gtf \
        --threads 2 \
        --libType=ISR \
        --index salmon \
        -1 SRR6357071_1_val_1.fq.gz -2 SRR6357071_2_val_2.fq.gz \
         \
        -o SRR6357071

    if [ -f SRR6357071/aux_info/meta_info.json ]; then
        cp SRR6357071/aux_info/meta_info.json "SRR6357071_meta_info.json"
    fi

    cat <<-END_VERSIONS > versions.yml
    "NFCORE_RNASEQ:RNASEQ:QUANTIFY_PSEUDO_ALIGNMENT:SALMON_QUANT":
        salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")
    END_VERSIONS

[... truncated ... ]

NFCORE_RNASEQ:RNASEQ:MULTIQC    7c/b0bbc5   
    multiqc \
        -n multiqc_report.html \
        -f \
         \
         \
        .

    cat <<-END_VERSIONS > versions.yml
    "NFCORE_RNASEQ:RNASEQ:MULTIQC":
        multiqc: $( multiqc --version | sed -e "s/multiqc, version //g" )
    END_VERSIONS
    

2.2.2. Execution cache and resume

Task execution caching is an essential feature of modern workflow managers. As such, Nextflow provides an automated caching mechanism for every execution. When using the Nextflow -resume option, successfully completed tasks from previous executions are skipped and the previously cached results are used in downstream tasks.

Nextflow caching mechanism works by assigning a unique ID to each task. The task unique ID is generated as a 128-bit hash value composing the the complete file path, file size, and last modified timestamp. These ID’s are used to create a separate execution directory where the tasks are executed and the outputs are stored. Nextflow will take care of the inputs and outputs in these folders for you.

You can re-launch the previously executed nf-core/rnaseq workflow again using -resume, and observe the progress. Change the output directory to be my_results. Notice the time it takes to complete the workflow.

nextflow run nf-core/rnaseq -r 3.14.0 \
    -profile apptainer \
    -params-file ./workshop-params.yaml \
    -resume \
    --outdir my_results
[6d/10f0b4] process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF_FILTER (genome.fasta)                      [100%] 1 of 1, cached: 1 ✔
[77/74cbf2] process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:GTF2BED (genome.filtered.gtf)                  [100%] 1 of 1, cached: 1 ✔
[02/f7e668] process > NFCORE_RNASEQ:RNASEQ:PREPARE_GENOME:MAKE_TRANSCRIPTS_FASTA (rsem/genome.fasta)     [100%] 1 of 1, cached: 1 ✔
...

Executing this workflow will create a my_results directory that contain selected results files, as well as the work directory, which contains further sub-directories.

In the schematic above, the hexadecimal numbers, such as 6d/10f0b4, identify the unique task execution. These numbers are also the prefix of the work subdirectories where each task is executed.

You can inspect the files produced by a task by looking inside the work directory and using these numbers to find the task-specific execution path. Use tab to autocomplete the full file path:

ls work/6d/10f0b4d0e6cf920e35657ce78feb1d/

If you look inside the work directory of a FASTQC process, you will find the files that were staged and created when this task was executed:

ls -la  work/e9/60b2e80b2835a3e1ad595d55ac5bf5/ 
total 1940
drwxrwxr-x 2 larigan larigan   4096 May 27 15:24 .
drwxrwxr-x 3 larigan larigan   4096 May 27 15:23 ..
-rw-rw-r-- 1 larigan larigan      0 May 27 15:24 .command.begin
-rw-rw-r-- 1 larigan larigan      0 May 27 15:24 .command.err
-rw-rw-r-- 1 larigan larigan     34 May 27 15:24 .command.log
-rw-rw-r-- 1 larigan larigan     34 May 27 15:24 .command.out
-rw-rw-r-- 1 larigan larigan  10394 May 27 15:24 .command.run
-rw-rw-r-- 1 larigan larigan    468 May 27 15:24 .command.sh
-rw-rw-r-- 1 larigan larigan    261 May 27 15:24 .command.trace
-rw-rw-r-- 1 larigan larigan      1 May 27 15:24 .exitcode
-rw-rw-r-- 1 larigan larigan 598884 May 27 15:24 SRR6357071_1_fastqc.html
-rw-rw-r-- 1 larigan larigan 365752 May 27 15:24 SRR6357071_1_fastqc.zip
lrwxrwxrwx 1 larigan larigan     66 May 27 15:24 SRR6357071_1.fastq.gz -> /home/larigan/rnaseq_data/testdata/GSE110004/SRR6357071_1.fastq.gz
lrwxrwxrwx 1 larigan larigan     21 May 27 15:24 SRR6357071_1.gz -> SRR6357071_1.fastq.gz
-rw-rw-r-- 1 larigan larigan 604569 May 27 15:24 SRR6357071_2_fastqc.html
-rw-rw-r-- 1 larigan larigan 355487 May 27 15:24 SRR6357071_2_fastqc.zip
lrwxrwxrwx 1 larigan larigan     66 May 27 15:24 SRR6357071_2.fastq.gz -> /home/larigan/rnaseq_data/testdata/GSE110004/SRR6357071_2.fastq.gz
lrwxrwxrwx 1 larigan larigan     21 May 27 15:24 SRR6357071_2.gz -> SRR6357071_2.fastq.gz
-rw-rw-r-- 1 larigan larigan     83 May 27 15:24 versions.yml

The FASTQC process runs twice, executing in a different work directories for each set of inputs. Therefore, in the previous example, the work directory [e9/60b2e8] represents just one of the two sets of input data that was processed.

It’s very likely you will execute a workflow multiple times as you find the parameters that best suit your data. You can save a lot of storage space (and time) by resuming a workflow from the last step that was completed successfully and/or unmodified.

In practical terms, the workflow is executed from the beginning. However, before launching the execution of a process, Nextflow uses the unique task ID to check if the work directory already exists and that it contains a valid command exit state with the expected output files. If this condition is satisfied, the task execution is skipped and previously computed results are saved in the output directory.

Notably, the -resume functionality is very sensitive. Even touching a file in the work directory can invalidate the cache.

Exercise: Invalidate the cache by touching a .fastq.gz file inside the FASTQC task work directory (you can use the touch command). Execute the workflow again with the -resume option. Has the cache has been invalidated?

Execute the workflow for the first time (if you have not already).

Use the task ID shown for the FASTQC process and use it to find and touch a .fastq.gz file:

touch work/ff/21abfa87cc7cdec037ce4f36807d32/SRR6357071_1.fastq.gz

Execute the workflow again with the -resume command option:

nextflow run nf-core/rnaseq -r 3.14.0 \
    -profile apptainer \
    -params-file ./workshop-params.yaml \
    -resume \
    --outdir my_results

You should see that some task were invalid and were executed again.

Why did this happen?

In this example, the caching of one of the two FASTQC tasks were invalid. The fastq file we touch is used by in the pipeline in multiple places. Thus, touching the symbolic link for this file and changing the date of last modification disrupted one of the task execution and its related downstream processes.

2.2.3. Troubleshoot warning and error messages

If we go back to our last exercise (exercise_rnaseq output), you might recall that while that workflow execution completed successfully, there were a couple of warning messages that may be cause for concern:

-[nf-core/rnaseq] Pipeline completed successfully with skipped sampl(es)-
-[nf-core/rnaseq] Please check MultiQC report: 2/2 samples failed strandedness check.-
Completed at: 04-May-2025 15:03:01
Duration    : 7m 59s
CPU hours   : 0.8
Succeeded   : 66
Handling dodgy error messages

The first warning message isn’t very descriptive (see this pull request). You might come across issues like this when running nf-core pipelines, too. Bug reports and user feedback is very important to open source software communities like nf-core. If you come across any issues, submit a GitHub issue or start a discussion in the relevant nf-core Slack channel so others are aware and it can be addressed by the pipeline’s developers.

➤ Take a look at the MultiQC report, as directed by the second message. You can find the MultiQC report in the exercise_rnaseq directory:

ls -la exercise_rnaseq/multiqc/star_salmon/
total 1402
drwxrwxr-x 4 rlupat rlupat    4096 Nov 22 00:29 .
drwxrwxr-x 3 rlupat rlupat    4096 Nov 22 00:29 ..
drwxrwxr-x 2 rlupat rlupat    8192 Nov 22 00:29 multiqc_data
drwxrwxr-x 5 rlupat rlupat    4096 Nov 22 00:29 multiqc_plots
-rw-rw-r-- 1 rlupat rlupat 1419998 Nov 22 00:29 multiqc_report.html

➤ Download the multiqc_report.html using the file navigator panel on the left side of your VS Code window. Right click the file navagator, then select Download. Open the file on your computer.

Take a look a the section labelled WARNING: Fail Strand Check

The warning indicates that the read strandedness we specified in our samplesheet.csv and inferred strandedness identified by the RSeqQC process in the pipeline do not match. In the samplesheet.csv, it seems we have incorrectly specified strandedness as forward, when our raw reads actually show an equal distribution of sense and antisense reads.

For those who are not familiar with RNAseq data, incorrectly specified strandedness may negatively impact the read quantification step (process: Salmon quant) and give us inaccurate results. So, let’s clarify how the Salmon quant process is gathering strandedness information for our input files by default and find a way to address this with the parameters provided by the nf-core/rnaseq pipeline.

2.2.4. Identify the run command for a process

To observe the exact command used a process, we can attempt to infer this information from the module’s main.nf script in the modules/ directory. However, given all the different parameters that may be applied at the process level, this may not be very clear.

➤ Take a look at the Salmon quant main.nf file.

This file contains many function definitions within the process, variable substitutions, and internal parameters determined based on strandedness. This makes it very hard to see what is actually happening in the code, given all the different variables and conditional arguments inside this script.

Above the script block, we can see strandedness is being applied using a few different conditional arguments. Instead of trying to infer how the $strandedness variable is being defined and applied to the process, let’s use the hidden command files saved for this process in its work execution directory.

Hidden files in the work directory!

Remember that the pipeline’s results are cached in the work directory. In addition to the cached files, each task execution directory inside the work directory contains a number of hidden files:

  • .command.sh: The command used for the task.
  • .command.run: Specifying resources, executor, software management profiles to use.
  • .command.out: The task’s standard output log.
  • .command.err: The task’s standard error log.
  • .command.log: A wrapper for the execution output.
  • .command.begin: A file created as soon as the job is launched.
  • .exitcode: A file containing the task exit code (0 if successful)

Within the nextflow log command that we discussed previously, there are multiple options to facilitate pipeline debugging and inspecting pipeline execution metadata. To understand how Salmon is interpreting strandedness, we’re going to use this command to determine the full path to hidden .command.sh scripts for each Salmon quant task that was run. This will allow us to investigate how Salmon handles strandedness and if there is a way for us to override this.

➤ Use the nextflow log command to get the unique run name information of previously executed pipelines. Then, add that run name to your command:

nextflow log <run-name>

After running the command, we can see that it provided a list of all the work subdirectories created for each processes when the pipeline was executed. How do we use this information to find the speicfic hidden.command.sh for Salmon tasks?

➤ Let’s use Bash to query a Nextflow run with the run name from the previous lesson. First, save your run name in a Bash variable run_name. For example:

run_name=marvelous_shannon

➤ And let’s save the tool of interest (salmon) in another Bash variable tool:

tool=salmon

➤ Next, run the following bash command:

nextflow log ${run_name} | while read line;
    do
      cmd=$(ls ${line}/.command.sh 2>/dev/null);
      if grep -q $tool $cmd;
      then  
        echo $cmd;     
      fi; 
    done 

This will list all process .command.sh scripts containing the word ‘salmon’. Notice that there are a few different processes that run Salmon to perform other steps in the workflow. We are looking for Salmon quant which performs the read quantification:

/home/larigan/lesson2.1/work/9c/9cdaec01c009a4fef6de3b50b0d2c9/.command.sh
/home/larigan/lesson2.1/work/d9/e696aa3903f2f7bef3ead3852a7d51/.command.sh
/home/larigan/lesson2.1/work/57/95f7806c62313c5788780d1fadc89a/.command.sh
/home/larigan/lesson2.1/work/2f/bef610318ab85c7fdbb3a773c568d5/.command.sh
/home/larigan/lesson2.1/work/ec/d8a46743dc73214b04e09c7ae9ecb4/.command.sh
/home/larigan/lesson2.1/work/f2/cc71ed58bbfba78ea034a26bd48370/.command.sh
/home/larigan/lesson2.1/work/3b/0a2737be44be977e4b695c5bade23f/.command.sh
/home/larigan/lesson2.1/work/8d/fed78effdd1b28435a698d8a6efb7a/.command.sh
/home/larigan/lesson2.1/work/65/bd7329a29aaf2136f25173a22918ae/.command.sh

Compared with the salmon quant main.nf file, we get a lot more fine scale details from the .command.sh process scripts:

main.nf:

salmon quant \\
        --geneMap $gtf \\
        --threads $task.cpus \\
        --libType=$strandedness \\
        $reference \\
        $input_reads \\
        $args \\
        -o $prefix

.command.sh:

salmon quant \
    --geneMap genome.filtered.gtf \
    --threads 2 \
    --libType=ISF \
    -t genome.transcripts.fa \
    -a SRR6357071.Aligned.toTranscriptome.out.bam \
     \
    -o SRR6357071

From .command.sh, we see that --libType has been set to ISF (ie. forward strandedness), based on our samplesheet.

Exercise: Besides changing the samplesheet input, we can use parameter settings to over-ride the --libType. Use the pipeline Parameters documentation to determine what parameter has to be changed. Instead, we would like this to be ISR (ie. reverse strandedness). How can we do this?

From the pipeline documentation, the --salmon_quant_libtype can be changed. To change the libType specified to Salmon to be ISR, we can specify --salmon_quant_libtype ISR using the command line or in a parameter file.

2.2.5. Using a parameter file

From the previous section we learn that Nextflow can accept a yaml parameter file. Any of the pipeline-specific parameters can be supplied to a Nextflow pipeline in this way.

Exercise: Set the Salmon libType to ISR, inside the workshop-params.yaml file we created previously.

YAML Formatting Tip
  • Strings need to be inside double quotes
  • Booleans (true/false) and numbers do not require quotes

workshop-params.yaml should now contain one additional parameter:

salmon_quant_libtype: "ISR" 

➤ Now that our params file has been saved, we can rerun the pipeline:

nextflow run nf-core/rnaseq -r 3.14.0 \
  -resume 
  -profile apptainer \
  -params-file workshop-params.yaml \
  --outdir exercise_rnaseq 

As the workflow runs a second time, you will notice 4 things:

  1. The nextflow run command is much tidier, due to the use of a -params-file that stores pipeline parameters used in a Nextflow run
  2. The -resume flag. Nextflow has many run options including the ability to use cached output
  3. Some processes will be pulled from the cache. These processes remain unaffected by our addition of the new parameter.
  4. This run of the pipeline will complete in a much shorter time compared to starting the pipeline from the beginning, due to pipeline caching.
-[nf-core/rnaseq] Pipeline completed successfully with skipped sampl(es)-
-[nf-core/rnaseq] Please check MultiQC report: 2/2 samples failed strandedness check.-
Completed at: 27-May-2025 17:13:48
Duration    : 1m 55s
CPU hours   : 0.2 (70.8% cached)
Succeeded   : 11
Cached      : 41

We still seem to be getting the warning Please check MultiQC report: 2/2 samples failed strandedness check.-. Let’s check what --libType has been used inside the salmon process.

Exercise: Determine the hexadecimal code output by Nextflow for a SALMON_QUANT process in your most recent run. Use this code to determine the work execution directory for SALMON_QUANT, and look inside the .command.sh. What --libType has been used? Is it the one we specified in our parameter file?

In the Nextflow output, the following line provides the hexadeximal for a SALMON_QUANT process.

[d9/69e2a7] process > NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_QUANT (SRR6357071)                                       [100%] 2 of 2 ✔

The work execution directory for this process is located in (using tab to autocomplete the folder name):

ls -a work/d9/69e2a7563de5e6046f140a1317c1d2/

Inside the .command.sh file, the --libType parameter matches the one specified in our parameter file:

#!/bin/bash -euo pipefail
salmon quant \
    --geneMap genome.filtered.gtf \
    --threads 2 \
    --libType=ISR \
    -t genome.transcripts.fa \
    -a SRR6357071.Aligned.toTranscriptome.out.bam \
     \
    -o SRR6357071

if [ -f SRR6357071/aux_info/meta_info.json ]; then
    cp SRR6357071/aux_info/meta_info.json "SRR6357071_meta_info.json"
fi

cat <<-END_VERSIONS > versions.yml
"NFCORE_RNASEQ:RNASEQ:QUANTIFY_STAR_SALMON:SALMON_QUANT":
    salmon: $(echo $(salmon --version) | sed -e "s/salmon //g")
END_VERSIONS

The tool is working as we expected!

How do I get rid of the strandedness check warning message?

If we want to remove the warning message Please check MultiQC report: 2/2 samples failed strandedness check, we will have to change the strandedness fields in our samplesheet.csv. Keep in mind, doing this will invalidate the pipeline’s cache and will cause the pipeline to run from the beginning.

Key points
  • Use nextflow log to query the record of commands used in the pipeline
  • Use -resume to re-launch previously executed workflows in order to get Nextflow to utilise its task execution caching feature
  • Examine the .command.sh to inside the work directory to troubleshoot the command that Nextflow use to run a particular task


Next Chapter: Introduction to Nextflow Processes and Channels



This workshop is adapted from various nextflow training materials, including: