Biowulf at the NIH
RSS Feed
Trinity and Trinotate on Helix & Biowulf

Trinity represents a novel method for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived from paralogous genes. Briefly, the process works like so:

Trinity was developed at the Broad Institute & the Hebrew University of Jerusalem. [Trinity website]

Trinotate is a comprehensive annotation suite designed for automatic functional annotation of transcriptomes, particularly de novo assembled transcriptomes, from model or non-model organisms. Trinotate makes use of a number of different well referenced methods for functional annotation including homology search to known sequence data (NCBI-BLAST), protein domain identification (HMMER/PFAM), protein signal prediction (singalP/tmHMM), and comparison to currently curated annotation databases (EMBL Uniprot eggNOG/GO Pathways databases). All functional annotation data derived from the analysis of transcripts is integrated into a SQLite database which allows fast efficient searching for terms with specific qualities related to a desired scientific hypothesis or a means to create a whole annotation report for a transcriptome.

Several version of Trinity are available on Helix/Biowulf. The available versions can be seen and loaded by using the modules commands, as in the example below. Note that loading the Trinity module will also implicitly load bowtie/1.0.0, which is required by Trinity. You don't need to load bowtie/1.0.0 separately.

[user@helix ~]$ module avail trinity                      (see what versions are available)
------------------------ /usr/local/Modules/3.2.9/modulefiles ---------------------
trinity/r2012-06-08 trinity/r2013-02-25 trinity/r2013-08-14 trinity/r2013-11-10

[user@helix ~]$ module load trinity                        (load the default version)

[user@helix ~]$ module list                               (confirm what you just loaded)
Currently Loaded Modulefiles:
  1) trinity/r2013-11-10

[user@helix ~]$ module unload trinity                     (unload that version)

[user@helix ~]$ module load trinity/r2013-02-25           (load a different version)

Likewise, to run Trinotate, load the 'trinotate' module which will add signalp, tmhmm, rnammer, and Blast+ to your path along with Trinity.

Running Trinity on Helix

The sample runs provided with the Trinity package tend to spawn off multiple java and perl processes, and run for a long time. Thus, Trinity is better suited for the Biowulf cluster than for Helix.

Running a Trinity batch job on Biowulf

The following batch job uses the sample data provided with the Trinity package. It runs one of the sample pipelines, test_full_edgeR_pipeline. The scripts in the sample data have been modiifed to have the correct paths for Biowulf/Helix.

Set up a batch script along the following lines:

#!/bin/bash
#
# this script is called trinity_test.bat

# sets up the paths for trinity, and creates the 
#   environment variable $TRINITY_ROOT
module load trinity

cd /data/$USER/trinity

# copy the sample data tree
cp -r $TRINITY_ROOT/sample_data .

# cd to one of the examples and run it
cd sample_data/test_full_edgeR_pipeline
./runMe.sh

Submit this script with, for example:

qsub -l nodes=1:g8 trinity_test.bat

Submitting a Trinotate batch job

Note: A large number of Blast-formatted databases are available in /fdb/blastdb/. The PFAM database is in /fdb/fastadb/pfam Sample batch script:

#!/bin/bash

cd $PBS_O_WORKDIR

module load trinotate

blastx -query Trinity.fasta -db swissprot -num_threads 8 -max_target_seqs 1 -outfmt 6 > blastx.outfmt6
hmmscan --cpu 8 --domtblout TrinotatePFAM.out Pfam-A.hmm transdecoder.pep > pfam.log
Trinotate  my.sqlite LOAD_swissprot_blastp ./blastx.outfmt6

Submit this job with, for example:

qsub -l nodes=1:g24 run.bat

Running Trinity interactively

It may be useful to run some test jobs interactively for debugging purposes, but most Trinity jobs should be run via the batch system as above.

Sample session:

biowulf% qsub -l nodes=1:g8 -I
qsub: waiting for job 5831646.biobos to start
qsub: job 5831646.biobos ready

[user@p1724 ~]$ cd /data/$USER/trinity

[user@p1724 ~]$ cp -r $TRINITY_ROOT/sample_data .

[user@p1724 ~]$ cd sample_data/test_full_edgeR_pipeline

[user@p1724 ~]$ ls
cleanme.pl  Makefile  rnaseq_reads  runMe.sh  samples_n_reads_decribed.txt

[user@p1724 ~]$ ./runMe.sh


#################################################################
Uncompressing rnaseq_reads/Sp_ds.10k.right.fq.gz
#################################################################
CMD: gunzip -c rnaseq_reads/Sp_ds.10k.right.fq.gz > rnaseq_reads/Sp_ds.10k.right.fq
TIME: 0.0 min. for gunzip -c rnaseq_reads/Sp_ds.10k.right.fq.gz > rnaseq_reads/Sp_ds.10k.right.fq


#################################################################
Uncompressiong rnaseq_reads/Sp_ds.10k.left.fq.gz
#################################################################
CMD: gunzip -c rnaseq_reads/Sp_ds.10k.left.fq.gz > rnaseq_reads/Sp_ds.10k.left.fq
TIME: 0.0 min. for gunzip -c rnaseq_reads/Sp_ds.10k.left.fq.gz > rnaseq_reads/Sp_ds.10k.left.fq


#################################################################
Concatenating left.fq files
#################################################################
CMD: cat rnaseq_reads/Sp_ds.10k.right.fq >> reads.ALL.left.fq
TIME: 0.0 min. for cat rnaseq_reads/Sp_ds.10k.right.fq >> reads.ALL.left.fq
[...]
#################################################################
Running Trinity de novo transcriptome assembly
#################################################################
CMD: /spin1/sys/i386/usrlocal/apps/trinity/trinityrnaseq_r20131110/util/..//Trinity.pl --left reads.ALL.left.fq --right reads.ALL.right.fq  --seqType fq  --JM 1G  --CPU 4  --SS_lib_type RF 
Current settings:
core file size          (blocks, -c) 1
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 65536
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) 7864320
open files                      (-n) 16384
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 65536
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited


Paired mode requires bowtie. Found bowtie at: /usr/local/apps/bowtie/1.0.0/bowtie

Found samtools at: /usr/local/bin/samtools

-since butterfly will eventually be run, lets test for proper execution of java
#######################################
Running Java Tests
Thursday, March 20, 2014: 13:22:10CMD: java -Xmx64m -jar /spin1/sys/i386/usrlocal/apps/trinity/trinityrnaseq_r20131110/util/ExitTester.jar 0
CMD finished (0 seconds)
Thursday, March 20, 2014: 13:22:10CMD: java -Xmx64m -jar /spin1/sys/i386/usrlocal/apps/trinity/trinityrnaseq_r20131110/util/ExitTester.jar 1
-we properly captured the java failure status, as needed.  Looking good.
Java tests succeeded.
[...]

[user@p1724 ~]$ exit
Job 5831646.biobos completed.

biowulf% 

Running Trinotate interactively

Sample session:

biowulf% qsub -l nodes=1:g8 -I
qsub: waiting for job 5831646.biobos to start
qsub: job 5831646.biobos ready

[susanc@p2319 trinity_out_dir]$ module load trinotate
[susanc@p2319 trinity_out_dir]$ blastx -query Trinity.fasta -db /fdb/blastdb/swissprot -num_threads 8 -max_target_seqs 1 -outfmt 6 > blastx.outfmt6
Selenocysteine (U) at position 73 replaced by X
Selenocysteine (U) at position 43 replaced by X
Selenocysteine (U) at position 129 replaced by X
Selenocysteine (U) at position 73 replaced by X
Selenocysteine (U) at position 40 replaced by X
[...etc...]
susanc@p2319 trinity_out_dir]$ exit
Job 5831646.biobos completed.

Documentation

Trinity documentation at sourceforge.

Trinotate documentation at sourceforge.