Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one.
Cufflinks is a collaborative effort between the Laboratory for Mathematical and Computational Biology, led by Lior Pachter at UC Berkeley, Steven Salzberg's group at the University of Maryland Center for Bioinformatics and Computational Biology, and Barbara Wold's lab at Caltech.
Cufflinks is provided under the OSI-approved Boost License
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.
[user@biowulf]$ module avail cufflinks ----------------------------- /usr/local/Modules/3.2.9/modulefiles -------------------------- cufflinks/0.9.3 cufflinks/1.3.0 cufflinks/2.0.1
cufflinks/1.2.0 cufflinks/2.0.0 cufflinks/2.0.2(default) [user@biowulf]$ module load cufflinks [user@biowulf]$ module list Currently Loaded Modulefiles: 1) cufflinks/2.0.2 [user@biowulf]$ module unload cufflinks [user@biowulf]$ module load cufflinks/2.0.1 [user@biowulf]$ module list Currently Loaded Modulefiles: 1) cufflinks/2.0.1 [user@biowulf]$ module show cufflinks ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/cufflinks/2.0.2: module-whatis Sets up cufflinks 2.0.2 prepend-path PATH /usr/local/apps/cufflinks/2.0.2 prepend-path LD_LIBRARY_PATH /usr/local/boost_1_44_0/lib -------------------------------------------------------------------
The iGenomes is available on helix/biowulf
in /fdb/igenomes.
Illumina has provided the RNA-Seq user community with a set of genome sequence indexes (including Bowtie, Bowtie2, and BWA indexes) as well as GTF transcript annotation files called iGenomes. These files can be used with TopHat and Cufflinks to quickly perform expression analysis and gene discovery. The annotation files are augmented with the tss_id and p_id GTF attributes that Cufflinks needs to perform differential splicing, CDS output, and promoter user analysis.
Sample Sessions On Biowulf
Submitting a single cufflinks batch job1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running.
2. Make sure you use proper number of thread (-p) with different type of nodes if you decide to specify thread number. Otherwise the node is going to be overloaded or underloaded. For example, g72 has 16 core, g4 has 2 cores, so specify '-p 2' if you ask for g4 nodes.
#!/bin/bash # This file is runCufflinks # #PBS -N Cufflinks #PBS -m be #PBS -k oe module load cufflinks cd /data/$USER/cufflinks/run1 cufflinks -p 4 InputFile
3. Submit the script using the 'qsub' command on Biowulf, e.g. In this example, '-p 4' is specified since the job is going to submitted to g8 node. Please change '-p' nubmer if you decide to use other nodes.
$ qsub -l nodes=1:g8 /data/$USER/runCufflinks
Submitting a swarm of cufflinks jobs
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:
cufflinks -p 4 InputFile -o /data/user/cufflinks/out1
cufflinks -p 4 InputFile -o /data/user/cufflinks/out2
........ ........ cufflinks -p 4 InputFile -o /data/user/cufflinks/out20
The '-f' and '--module' options for swarm are required, and two other flags are possibly needed to submit a swarm job: '-t' and '-g'.
- -f: the swarm command file name above (required)
- --module: load the preserves the environment variables for the swarm jobs
- -t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
- -g: gb of memory needed for each line of the commands in the swarm file above.(optional)
By default, each line of the command file above will be executed on 1 processor core of a node and use 1gb of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.
For the example above, '-p 4' is specified for cufflinks to run using 4 threads:
biowulf> $ swarm -t 4 -f cmdfile --module cufflinks
Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:
biowulf> $ swarm -g 10 -t 4 -f cmdfile --module cufflinks
For more information regarding running swarm, see swarm.html
Submit an interactive cufflinks job
1. To do so, user first allocate a node from the cluster then run commands interactively on the node. DO NOT RUN ON BIOWULF LOGIN NODE:
or if your job require bigger memory,
2. Once the job started and a node is allocated, run the interactive commands.
pXXX> $ cd /data/$USER/cufflinks
pXXX> $ module load cufflinks
pXXX> $ cufflinks -p 16 InputFile -o /data/user/cufflinks/out1
pXXX> $ exit


