Biowulf at the NIH
RSS Feed
Bcl2fastq on Biowulf

Bcl2FastQ conversion software is a tool to handle bcl conversion and demultiplexing. Version 1.8.4 has added ability to mask multiple adapter sequences per read, has standard Illumina adapter sequences included in the bcl2fastq installation, and the stringency of the adapter masking feature is now configurable.

To initiate Bcl2fastq, run this first:

$ module load bcl2fastq

To view the module, run:

$ module show bcl2fastq

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf $ qsub -I -l nodes=1:g24:c16
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@pXXX]$ module load bcl2fastq
[user@pXXX]$ cd /data/$USER/XXXX

[user@pXXX]$ bcl2fastq \
				--input-dir <BaseCalls_ dir> \
				--output-dir <Unaligned> \
[user@pXXX]$ exit
qsub: job 2236960.biobos completed

Submitting a single batch job

1. Create a script file along with the lines below. Modify the path of location before running.


# This file is runbcl2fastq
#PBS -N bcl2fastq
#PBS -m be
#PBS -k oe
module load bcl2fastq
cd /data/$USER//run1
bcl2fastq --input-dir xxxx --ouput-dir xxxx

3. Submit the script using the 'qsub' command on Biowulf

$ qsub -l nodes=1:g24:c16 /data/$USER/runbcl2fastq


Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/$USER/run1; bcl2fastq command 1; bcl2fastq command 2
cd /data/$USER/run2; bcl2fastq command 1; bcl2fastq command 2
........ ........ cd /data/$USER/run20; bcl2fastq command 1; bcl2fastq command 2

The '-f' and '--module' options for swarm are required

  • -f: the swarm command file name above (required)
  • --module: load the preserves the environment variables for the swarm jobs
  • -g: gb of memory needed for each line of the commands in the swarm file above.(optional)

By default, each line of the command file above will be executed on 1 processor core of a node and use 1gb of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -f cmdfile --module bcl2fastq

For more information regarding running swarm, see swarm.html