Biowulf at the NIH
RSS Feed
Cufflinks on Biowulf

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.

Cufflinks is a collaborative effort between the Laboratory for Mathematical and Computational Biology, led by Lior Pachter at UC Berkeley, Steven Salzberg's group at the University of Maryland Center for Bioinformatics and Computational Biology, and Barbara Wold's lab at Caltech.

Cufflinks is provided under the OSI-approved Boost License

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

[user@biowulf]$ module avail cufflinks
----------------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------
cufflinks/0.9.3          cufflinks/1.3.0          cufflinks/2.0.1
cufflinks/1.2.0 cufflinks/2.0.0 cufflinks/2.0.2(default) [user@biowulf]$ module load cufflinks [user@biowulf]$ module list Currently Loaded Modulefiles: 1) cufflinks/2.0.2 [user@biowulf]$ module unload cufflinks [user@biowulf]$ module load cufflinks/2.0.1 [user@biowulf]$ module list Currently Loaded Modulefiles: 1) cufflinks/2.0.1 [user@biowulf]$ module show cufflinks ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/cufflinks/2.0.2: module-whatis Sets up cufflinks 2.0.2 prepend-path PATH /usr/local/apps/cufflinks/2.0.2 prepend-path LD_LIBRARY_PATH /usr/local/boost_1_44_0/lib -------------------------------------------------------------------

The iGenomes is available on helix/biowulf in /fdb/igenomes.

Illumina has provided the RNA-Seq user community with a set of genome sequence indexes (including Bowtie, Bowtie2, and BWA indexes) as well as GTF transcript annotation files called iGenomes. These files can be used with TopHat and Cufflinks to quickly perform expression analysis and gene discovery. The annotation files are augmented with the tss_id and p_id GTF attributes that Cufflinks needs to perform differential splicing, CDS output, and promoter user analysis.

Sample Sessions On Biowulf

Submitting a single cufflinks batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running.

2. Make sure you use proper number of thread (-p) with different type of nodes if you decide to specify thread number. Otherwise the node is going to be overloaded or underloaded. For example, g72 has 16 core, g4 has 2 cores, so specify '-p 2' if you ask for g4 nodes.

# This file is runCufflinks
#PBS -N Cufflinks
#PBS -m be
#PBS -k oe
module load cufflinks
cd /data/$USER/cufflinks/run1
cufflinks -p 4 InputFile

3. Submit the script using the 'qsub' command on Biowulf, e.g. In this example, '-p 4' is specified since the job is going to submitted to g8 node. Please change '-p' nubmer if you decide to use other nodes.

$ qsub -l nodes=1:g8 /data/$USER/runCufflinks


Submitting a swarm of cufflinks jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cufflinks -p 4 InputFile -o /data/user/cufflinks/out1
cufflinks -p 4 InputFile -o /data/user/cufflinks/out2
........ ........ cufflinks -p 4 InputFile -o /data/user/cufflinks/out20

The '-f' and '--module' options for swarm are required, and two other flags are possibly needed to submit a swarm job: '-t' and '-g'.

By default, each line of the command file above will be executed on 1 processor core of a node and use 1gb of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.

For the example above, '-p 4' is specified for cufflinks to run using 4 threads:

biowulf> $ swarm -t 4 -f cmdfile --module cufflinks

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -t 4 -f cmdfile --module cufflinks

For more information regarding running swarm, see swarm.html


Submit an interactive cufflinks job

1. To do so, user first allocate a node from the cluster then run commands interactively on the node. DO NOT RUN ON BIOWULF LOGIN NODE:

$ qsub -I -l nodes=1:g8

or if your job require bigger memory,

$ qsub -I -l nodes=1:g24:c16

2. Once the job started and a node is allocated, run the interactive commands.

pXXX> $ cd /data/$USER/cufflinks
pXXX> $ module load cufflinks
pXXX> $ cufflinks -p 16 InputFile -o /data/user/cufflinks/out1

pXXX> $ exit

' -p 16' in the command because the g24 node requested have 16 cores