biowulf_logo

Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


X!Tandem on Biowulf

X! Tandem can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification. This software takes an XML file of instructions on its command line, and outputs the results into an XML file, which has been specified in the input XML file.

X!Tandem was developed by researchers as part of the Global Proteome Machine Organization.

Small numbers of X!Tandem jobs should be performed on a local desktop machine. Running X!Tandem on the Biowulf cluster is useful only if you have large numbers (100s, 1000s, or 10000s) of jobs, since the independent jobs can run simultaneously on different Biowulf nodes.

How to run X!Tandem on Biowulf

Set up an X!Tandem input file for each run (you will probably want to write a script to set up these input files). Note that X!Tandem will by default write output files into its installation area /usr/local/xtandem/ where users do not have write permission, so it is important to use full pathnames in your input file.
-----sample input file---------------------------------
<?xml version="1.0"?>
<bioml>
   <note type="input" 
     label="list path, default parameters">/data/user/myproj/default_input.xml</note>
   <note type="input" 
     label="list path, taxonomy information">/usr/local/xtandem/bin/taxonomy.xml</note>

   <note type="input" label="protein, taxon">other mammals</note>

   <note type="input" label="spectrum, path">/data/user/myproj/spectrum_1.pkl</note>

   <note type="input" label="output, path">/data/user/myproj/output_1.xml</note>
</bioml>
----------------------end of sample file-------------------

Large numbers of single-threaded jobs like this are submitted using the swarm utility. Set up a swarm command file containing one line for each of your OMSSA runs. Here is a sample swarm command file:

------------------file sample.com--------------------
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input1.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input2.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input3.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input4.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input5.xml

----------------end of file -------------------------
Submit this file with
swarm -f sample.com

Bundling jobs

If you have over 1000 X!Tandem searches to run, they should be bundled with the '-b' flag to swarm. '-b 25' will send 25 of the commands to a single processor, and then submit two such bundles as a single swarm job. This hugely decreases the number of individual jobs and therefore decreases the overhead for such large numbers of small jobs. (More information about swarm options)

Thus, to run X!Tandem on 5000 dta files, you would set up a swarm command file with one line per file as described above. This file would be submitted to the swarm program using:

swarm -b 50 -f sample.com
eric swarm will send 50 commands to a single processor, and 50x2 = 100 commands as a single batch job to a node. The total number of jobs will be 5000 / 100 = 50 swarm jobs.

Monitoring your jobs

As always, jobs can be monitored using the Biowulf cluster monitors. Click on 'List status of running jobs only', and then your username or job number on the resultant page to view your own jobs only, as in the image on the right.

More information about X!Tandem can be found at the X!Tandem website.


This document is available as http://biowulf.nih.gov/apps/xtandem.html
Biowulf home page | Helix Systems | NIH