Biowulf at the NIH
RSS Feed
Impute on Biowulf

IMPUTE is a program for estimating ("imputing") unobserved genotypes in SNP association studies. The program is designed to work seamlessly with the output of the genotype calling program CHIAMO and the population genetic simulator HAPGEN, and it produces output that can be analyzed using the program SNPTEST. IMPUTE website at Oxford.

The associated programs snptest, gtool and qctool are also available in the /usr/local/impute directory. All these executables will become available in your path if you set up the environment with 'module load impute' (once per session). If you expect to use these programs frequently, you can add 'module load impute' to your .bashrc or .cshrc file.

module load impute

Small numbers of Impute jobs (less than 3 simultaneous) should be run on Helix. It is only useful to run Impute on Biowulf if you want to run large numbers of simultaneous Impute jobs. The easiest way to set up multiple Biowulf jobs is via the swarm program.

Setting up a swarm of Impute jobs

Set up a swarm commmand file with one line for each Impute run. If running a swarm, it's best to add 'module load impute' into your .bashrc or .cshrc file. Example:

# this file is impute_swarm
cd /data/user/dir1; impute2 -ref_samp_out -m chr16.map -h chr16.haps  -l chr16.legend -g gtypes -s refstrand1  -Ne 11418 -int 5000000 5500000 -buffer 250 -k 10 -iter 10 -burnin 3  -o out1  -i info1  -r summary1
cd /data/user/dir2; impute2 -ref_samp_out -m chr26.map -h chr26.haps  -l chr26.legend -g gtypes -s refstrand2  -Ne 22428 -int 5000000 5500000 -buffer 250 -k 20 -iter 20 -burnin 3  -o out2  -i info2  -r summary2
cd /data/user/dir3; impute2 -ref_samp_out -m chr36.map -h chr36.haps  -l chr36.legend -g gtypes -s refstrand3  -Ne 33438 -int 5000000 5500000 -buffer 250 -k 30 -iter 30 -burnin 3  -o out3  -i info3  -r summary3
[...]

If each Impute process requires less than 1 GB of memory, submit this to the batch system with the command:

swarm -f cmdfile

If each Impute process requires more than 1 GB of memory, use

swarm -g # -f cmdfile
where '#' is the number of Gigabytes of memory required by each Impute process.

Documentation

IMPUTE user manual

IMPUTE v2 documentation

GTOOL

SNPTEST