X!Tandem on Biowulf
X! Tandem can match tandem mass spectra with peptide sequences, in a process that has come to be known as protein identification.
This software takes an XML file of instructions on its
command line, and outputs the results into an XML file, which has been
specified in the input XML file.
X!Tandem was developed by researchers as part of the
Global Proteome Machine Organization.
Small numbers of X!Tandem jobs should be performed on a local desktop machine. Running
X!Tandem on the Biowulf cluster is useful only if you have large numbers (100s, 1000s, or
10000s) of jobs, since the independent jobs can run simultaneously on different Biowulf nodes.
How to run X!Tandem on Biowulf
Set up an X!Tandem input file for each run (you will probably want to write a script
to set up these input files). Note that X!Tandem will by default
write output files into its installation area
/usr/local/xtandem/ where users do not have write permission,
so it is important to use full pathnames
in your input file.
-----sample input file---------------------------------
<?xml version="1.0"?>
<bioml>
<note type="input"
label="list path, default parameters">/data/user/myproj/default_input.xml</note>
<note type="input"
label="list path, taxonomy information">/usr/local/xtandem/bin/taxonomy.xml</note>
<note type="input" label="protein, taxon">other mammals</note>
<note type="input" label="spectrum, path">/data/user/myproj/spectrum_1.pkl</note>
<note type="input" label="output, path">/data/user/myproj/output_1.xml</note>
</bioml>
----------------------end of sample file-------------------
Large numbers of single-threaded jobs like this are submitted using the
swarm utility. Set up a swarm command file containing one line for each of your OMSSA runs. Here is a sample swarm command file:
------------------file sample.com--------------------
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input1.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input2.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input3.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input4.xml
/usr/local/xtandem/bin/tandem.exe /data/user/myproj/input5.xml
----------------end of file -------------------------
Submit this file with
swarm -f sample.com
Bundling jobs
If you have over 1000 X!Tandem searches to run, they should be bundled with the '-b' flag to swarm.
'-b 25' will send 25 of the commands to a single processor, and then submit two such bundles as a
single swarm job. This hugely decreases the number of individual jobs and therefore decreases the
overhead for such large numbers of small jobs. (More information about
swarm options)
Thus, to run X!Tandem on 5000 dta files, you would set up a swarm command file with one line
per file as described above. This file would be submitted to the swarm program using:
swarm -b 50 -f sample.com
swarm will send 50 commands to a single processor, and 50x2 = 100 commands as a single batch job
to a node. The total number of jobs will be 5000 / 100 = 50 swarm jobs.
Monitoring your jobs
As always, jobs can be monitored using the Biowulf
cluster monitors. Click on 'List status of running jobs only',
and then your username or job number on the resultant page to view
your own jobs only, as in the image on the right.
More information about X!Tandem can be found at the
X!Tandem website.
|