Biowulf at the NIH
RSS Feed
Trimmomatic on Biowulf

Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.The selection of trimming steps and their associated parameters are supplied on the command line. Trimmomatic was developed at the Usadel lab in Aachen, Germany.

The current trimming steps are:

It works with FASTQ (using phred + 33 or phred + 64 quality scores, depending on the Illumina pipeline used), either uncompressed or gzipp'ed FASTQ. Use of gzip format is determined based on the .gz extension.

For single-ended data, one input and one output file are specified, plus the processing steps. For paired-end data, two input files are specified, and 4 output files, 2 for the 'paired' output where both reads survived the processing, and 2 for corresponding 'unpaired' output where a read survived, but the partner read did not.

Use the modules commands to set up trimmomatic, as in the example below. By loading the module, you will set up an alias called 'trimmomatic' which is equivalent to 'java -classpath /usr/local/apps/trimmomatic/Trimmomatic-0.25/trimmomatic-0.25.jar'. The module will also set an environment variable called 'TRIMMOJAR' which points to the location of the trimmomatic java file.

Running an interactive Trimmomatic job

First allocate an interactive node as in the example below.

biowulf% qsub -I -l nodes=1
qsub: waiting for job 2670762.biobos to start
qsub: job 2670762.biobos ready

[user@p282 ~]$

[user@p282 ~]$ module avail trimmomatic

----------------- /usr/local/Modules/3.2.9/modulefiles ------------------
trimmomatic/0.25
[user@p282 ~]$ module load trimmomatic
[user@p282 ~]$
[user@p282 ~]$ module list
Currently Loaded Modulefiles:
  1) trimmomatic/0.25

[user@p282 ~]$ trimmomatic org.usadellab.trimmomatic.TrimmomaticPE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:illuminaClipping.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

[user@p282 ~]$java -Xmx2g -classpath $TRIMMOJAR org.usadellab.trimmomatic.TrimmomaticPE s_1_1_sequence.txt.gz s_1_2_sequence.txt.gz lane1_forward_paired.fq.gz lane1_forward_unpaired.fq.gz lane1_reverse_paired.fq.gz lane1_reverse_unpaired.fq.gz ILLUMINACLIP:illuminaClipping.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

[user@p282 ~]$ exit
qsub: job 2670762.biobos completed

The first sample trimmomatic command in this example uses the 'trimmomatic' alias. The second sample command requests 2g of memory (larger than the default 1g set by the trimmatic alias), and uses the $TRIMMOJAR environment variable.

Running a Trimmomatic batch job

Set up a batch script similar to the one below:

#!/bin/bash
# --- this script is called trim.bat -----

cd /data/mydir
module load trimmomatic
java -classpath $TRIMMOJAR   org.usadellab.trimmomatic.TrimmomaticPE   s_1_1_sequence.txt.gz   s_1_2_sequence.txt.gz \
    lane1_forward_paired.fq.gz  lane1_forward_unpaired.fq.gz   \
    lane1_reverse_paired.fq.gz  lane1_reverse_unpaired.fq.gz  \
    ILLUMINACLIP:illuminaClipping.fa:2:40:15 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

Submit this job with:

qsub -l nodes=1 trim.bat

Running a swarm of trimmomatic jobs

If you have a large number of trimmomatic jobs to be run, swarm is a convenient way to do so. Create a swarm command file called, say, trim.swarm, similar to the one below:

java -jar $TRIMMOJAR PE -threads 8 -phred33 input1a input1b [...]
java -jar $TRIMMOJAR PE -threads 8 -phred33 input2a input2b [...]
java -jar $TRIMMOJAR PE -threads 8 -phred33 input3a input3b [...]
[...etc....]

Submit this swarm with:

swarm -f trim.swarm --module trimmomatic

Documentation

Trimmomatic website