SimWalk2 is a statistical genetics computer application for haplotype, parametric linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms to perform these multipoint analyses.
Simwalk2 was developed by Eric Sobel, Kenneth Lange, Daniel Weeks, Jeff O'Connell, and Goncalo Abecasis at UCLA. SimWalk2 documentation at UCLA
Simwalk is also available on Helix . Users who need relatively few Simwalk runs should use it on Helix. It is advantageous to run Simwalk on Biowulf only if you need large numbers of Simwalk runs.
For each Simwalk job, you can set up a batch command file and submit via qsub to the Biowulf batch system. Each job will then be allotted a node. Since Simwalk is not parallelized, the job will use only 1 processor of the node, which is inefficient use of the system. A preferable way to submit large numbers of Simwalk jobs is via the swarm command.
- Set up the Simwalk jobs, each in a directory with the appropriate input files.
- Create a swarm command file as below, with one line for each simwalk job.
------------------ file cmdfile ----------------------- cd /data/user/simwalk/run1; simwalk2 cd /data/user/simwalk/run2; simwalk2 cd /data/user/simwalk/run3; simwalk2 [...etc...] -------------------------------------------------------
There are one flag of swarm that's required '-f' and two other flags of swarm user most possibly needs to specify when submit a swarm job: '-t' and '-g'.
-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)
By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.
Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:
biowulf> $ swarm -g 10 -f cmdfile
For more information regarding running swarm, see swarm.html
Typically, Simwalk runs should be done via non-interactive batch or the swarm command. It may sometimes be useful to run interactively for debugging purposes.
This test run uses the files MAP.DAT, LOCUS.DAT, PEDIGREE.DAT and PEN.DAT from the Simwalk Example set, and the example sampling analysis (file BATCH-01.DAT) is being performed. These files can be copied from /usr/local/src/simwalk/SimWalk289/Examples.
[user@biowulf ~]$ qsub -I -l nodes=1 qsub: waiting for job 521768.biobos to start qsub: job 521768.biobos ready [user@p554 ~]$ cd mydir [user@p554 ~/mydir]$ simwalk2 SimWalk2 version 2.89 Type of data analysis: Pedigree Sampling Locus data INPUT file: LOCUS.DAT Pedigree data INPUT file: PEDIGREE.DAT Map data INPUT file: MAP.DAT Individual OUTPUT files: MODEL-01.mmm Copy of all screen output: VIDEO-01.TXT Here 'mmm' is from the order within the input pedigree file, e.g., '001' for the first pedigree, etc. Working on data initialization ... WARNING. In the locus file: LOCUS.DAT for the following loci, minor adjustments had to be made to the allele frequencies to force them to sum to 1.0: CACNL1A1 pY2/1 KCNA5 S93 Map data file 'MAP.DAT' completed initialization; Locus data file 'LOCUS.DAT' completed initialization; Pedigree #001 completed initialization; All data completed initialization. Working on pedigree analysis ... Pedigree #001 ('20') working on simulated annealing ... (Found an initial consistent state.) 25% done ... 50% done ... 75% done ... Pedigree #001 ('20') completed simulated annealing. Pedigree #001 ('20') working on Markov chain Monte Carlo process ... 25% done ... 50% done ... 75% done ... Pedigree #001 ('20') completed Markov chain Monte Carlo process. Pedigree #001 ('20') completed all analyses. All individual pedigrees completed analysis. Please see the following output files. Individual OUTPUT files: MODEL-01.mmm Copy of all screen output: VIDEO-01.TXT Here 'mmm' is from the order within the input pedigree file, e.g., '001' for the first pedigree, etc. Program run completed! [user@p554 mydir]$exit [user@biowulf ~]
Simwalk documentation at UCLA.