Biowulf at the NIH
RSS Feed
TransDecoder on Helix and Biowulf
TransDecoder identifies candidate coding regions within transcript sequences, such as those generated by de novo RNA-Seq transcript assembly using Trinity, or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder is developed/maintained by Brian Haas at the Broad Institute and Alexie Papanicolaou at the Commonwealth Scientific and Industrial Research Organisation (CSIRO). It is integrated into other related software such as Trinity, PASA, EVidenceModeler, and Trinotate. [TransDecoder website]

Running TransDecoder on Helix

The following example uses the sample data provided with the program. Note that all scripts are in $TD_HOME. Sample session:

helix% mkdir transdecoder_test

helix% cd transdecoder_test

helix%  cp -r $TD_HOME/sample_data .

helix% cd sample_data; gunzip *.gz

helix% module load transdecoder

helix%  $TD_HOME/cufflinks_gtf_to_alignment_gff3.pl transcripts.gtf > transcripts.gff3

helix% $TD_HOME/util/cufflinks_gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta
-parsing cufflinks output: transcripts.gtf
-parsing genome fasta: test.genome.fasta
-done parsing genome.
// processing 7000000090838467

helix% TransDecoder -t transcripts.fasta
CMD: /usr/local/apps/transdecoder/transdecoder_rel16JAN2014/util/get_top_longest_fasta_entries.pl transdecoder.tmp.123160/longest_orfs.cds 2000 > transdecoder.tmp.123160/redundant_top
CMD: /usr/local/apps/transdecoder/transdecoder_rel16JAN2014/util/bin/cd-hit-est -r 1 -i transdecoder.tmp.123160/redundant_top -o transdecoder.tmp.123160/redundant_top.nr90 -M 0 -T 2 >/dev/null 2>/dev/null
CMD: /usr/local/apps/transdecoder/transdecoder_rel16JAN2014/util/get_top_longest_fasta_entries.pl transdecoder.tmp.123160/redundant_top.nr90 500 > transdecoder.tmp.123160/longest_orfs.cds.top_500_longest
[...]
-indexing [CUFF.9.1|g.6]    
transdecoder is finished.

helix%

You should now have several files called 'transcripts.fasta.transdecoder.* containing the outputs from TransDecoder.

Running a single TransDecoder job on Biowulf

Set up a batch script along the following lines.

#!/bin/bash
#PBS -N TransDecoder

module load transdecoder
cd $PBS_O_WORKDIR
$TD_HOME/cufflinks_gtf_to_alignment_gff3.pl transcripts.gtf > transcripts.gff3
$TD_HOME/util/cufflinks_gtf_genome_to_cdna_fasta.pl transcripts.gtf test.genome.fasta > transcripts.fasta
TransDecoder -t transcripts.fasta

Submit with:

qsub -l nodes=1:g24 td.bat

Documentation

TransDecoder website