Biowulf at the NIH
RSS Feed
Merlin on Biowulf

MERLIN uses sparse trees to represent gene flow in pedigrees and is one of the fastest pedigree analysis packages around (Abecasis et al, 2002).

For small numbers of Merlin runs, Helix is the simplest system to use. Merlin on Biowulf would typically be used to run large numbers of simultaneous Merlin jobs with different input files or parameters.

Submitting a single Merlin batch job

1. Create a script file. The file will contain the lines similar to the lines below between dotted lines. Modify the command on the last line to use your own input file and the desired parameters:

.....................file /home/username/runMerlin........................
#!/bin/bash
# This file is runMerlin
#
#PBS -N merlin
#PBS -m be
#PBS -k oe
date
/usr/local/bin/merlin -d c1.dat -m c1.map -p simdata --quiet

2. Submit the script using the 'qsub' command, e.g.

qsub -v -l nodes=1 /home/username/runMerlin

Running a 'swarm' of Merlin jobs

The swarm program is a convenient way to submit large numbers of jobs all at once instead of manually submit them one by one.

1. Create a swarm command file containing a single job on each line, e.g.

................this file is /home/username/merlinjobs.......
/usr/local/bin/merlin -d c1.dat -m c1.map -p simdata --quiet
/usr/local/bin/merlin -d c2.dat -m c2.map -p simdata --quiet
/usr/local/bin/merlin -d c3.dat -m c3.map -p simdata --quiet
/usr/local/bin/merlin -d c4.dat -m c4.map -p simdata --quiet
/usr/local/bin/merlin -d c5.dat -m c5.map -p simdata --quiet
[...]
                        

2. There are one flag of swarm that's required '-f' and two other flags of swarm user most possibly needs to specify when submit a swarm job: '-t' and '-g'.

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -f cmdfile

For more information regarding running swarm, see swarm.html

Documentation

http://www.sph.umich.edu/csg/abecasis/Merlin/tour/linkage.html