biowulf_logo

Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


EMBOSS PACKAGE

EMBOSS stands for "The European Molecular Biology Open Software Suite". Within EMBOSS you will find around hundreds of programs (applications) covering areas such as:

  • Sequence alignment
  • Rapid database searching with sequence patterns
  • Protein motif identification, including domain analysis
  • Nucleotide sequence pattern analysis---for example to identify CpG islands or repeats
  • Codon usage analysis for small genomes
  • Rapid identification of sequence patterns in large scale sequence sets

When to use EMBOSS on Biowulf

EMBOSS on Biowulf is intended for running a large number of sequence files, such as hundreds or thousands of query sequences, using the programs in EMBOSS. If you have just a few query sequences, you should use EMBOSS web interface or command line on Helix. Please contact the Helix Systems staff staff@helix.nih.gov, or 301-594-6248) if you have questions about your EMBOSS jobs.

EMBOSS Documentation

EMBOSS Database Status

Submit Multiple Jobs Using swarm Program

1. For csh or tcsh users, set PLPLOT_LIB variable and add /usr/local/emboss/bin to your path. You can insert the following lines at the end of your .cshrc file:

setenv PLPLOT_LIB /usr/local/emboss/plplot/lib
set path=( /usr/local/emboss/bin ${path} )
setenv emboss_acdroot /usr/local/emboss/share/EMBOSS/acd
Or for bash/ksh/sh users, insert the following at the end of your .bashrc file:
PLPLOT_LIB=/usr/local/emboss/lib
PATH=/usr/local/emboss/bin:$PATH
emboss_acdroot=/usr/local/emboss/share/EMBOSS/acd
export PLPLOT_LIB PATH emboss_acdroot

2. Setup a command file to run swarm. For example, to run the emboss program 'seqret' for 2500 sequences, create a file called 'cmd.file' which contains the following lines:

seqret -sequence 'genbank:ab1681*' -outseq 'outseq1'
seqret -sequence 'swissprot:P16310' -outseq 'outseq2'
seqret -sequence 'genpept:M31661' -outseq 'outseq3'

...............
.............
...............
seqret -sequence 'refseqnt:nc_011*' -outseq 'outseq4'

Each command line in the cmd.file should appear just as they would be entered on a command line.

3. If you have over 1000 commands, especially if each one runs for a short time, you should 'bundle' your jobs with the -b flag.  This will greatly increase the speed of your jobs and prevent overwork of cluster. To bundle your jobs, first use the following formula to determine the value BN:

BN= 'command number' / (nodes no. x 2)

So for example, if you have 5000 commands in your swarm file, and the current maximum node number per user is 64, then BN = 5000 / (64x2) = 39.06 (round to 40). Then submit the swarm job as below, where 40 is the BN value:

swarm -f cmdfile -b 40

4. Sometimes, it is very time-consuming to put together a command file for a swarm job ( for example, 800 lines in a file). you will probably want to write a simple csh or perl script to build this swarm command file. If you are unfamiliar with csh, Basic scripting with csh maybe useful. The following is an exmaple using csh to build a command file:

        helix% cd my_sequence_directory
        helix% touch cmdfile
        helix% foreach file (*)
        foreach> echo "patmatmotifs $file $file.out >> cmdfile end
        helix%

5. More info regarding swarm program

 


Biowulf home page | Helix Systems | NIH