MACH 1.0 is a Markov Chain based haplotyper. It can be resolve long haplotypes or infer missing genotypes in samples of unrelated individuals.
Mach was developed by Goncalo Abecasis at the University of Michigan. Mach website
Small numbers of simultaneous Mach jobs (< 3 simultaneous) are most easily run on Helix. Mach on Biowulf is intended for large numbers of simultaneous jobs, or Mach jobs that will run for a long time.
Minimac is a low memory, computationally efficient implementation of the MaCH algorithm for genotype imputation. It is related to Mach and will get loaded as part of the Mach module. [Minimac webpage].
ChunkChromosome is a helper utility for minimac and MaCH. It can be used to facilitate analyses of very large datasets in overlapping slices. It will get loaded as part of the Mach module. [ChunkChromosome webpage.
The utility FCgene, a format converting tool for genotyped data (e.g. PLINK-MACH, MACH-PLINK) is also available. Type 'module load fcgene' to add the binary to your path, and then 'fcgene' to run it.
It is easiest to run a large number of simultaneous Mach jobs via swarm.
The Mach environment can be set up with the 'module load mach1' command. This will load the latest version:
[user@biowulf]$ module load mach1 [user@biowulf]$ module list Currently Loaded Modulefiles: 1) mach1/1.0.18
To load a specific version, use the modules commands to see available versions and load one, as in the example below:
[user@biowulf]$ module avail mach1 ---------------- /usr/local/Modules/3.2.9/modulefiles -------------- mach1/1.0.12 mach1/1.0.17 mach1/1.0.18 [user@biowulf]$ module load mach1/1.0.17 [user@biowulf]$ module list Currently Loaded Modulefiles: 1) mach1/1.0.17
#------- this file is swarmcmd ------------------ mach1 --datfile sample1.dat --pedfile sample1.ped mach1 --datfile sample2.dat --pedfile sample2.ped mach1 --datfile sample3.dat --pedfile sample3.ped mach1 --datfile sample4.dat --pedfile sample4.ped [...]
swarm -f cmdfile --module mach1/1.0.18
If each Mach process requires more than 1 GB of memory, use
swarm -g # -f cmdfile --module mach1/1.0.18where '#' is the number of Gigabytes of memory required by each Mach process.
The swarm program will package the commands for best efficiency and send them to the batch system.