Biowulf at the NIH
RSS Feed
Plink on Biowulf

Plink is a whole genome association analysis toolset, designed to perform a range of basic, large-scale analyses in a computationally efficient manner.

The focus of PLINK is purely on analysis of genotype/phenotype data, so there is no support for steps prior to this (e.g. study design and planning, generating genotype calls from raw data). Through integration with gPLINK and Haploview, there is some support for the subsequent visualization, annotation and storage of results.

PLINK (one syllable) is being developed by Shaun Purcell at the Center for Human Genetic Research (CHGR), Massachusetts General Hospital (MGH), and the Broad Institute of Harvard & MIT, with the support of others.

PLINK is not a parallel program. Single PLINK jobs should be run interactively on the Biowulf interactive nodes or Helix. If you have multiple PLINK jobs to run, the swarm utility is the easiest way to run them.

Available versions of Plink can be seen and loaded by using the modules commands, as in the example below:

biowulf% module avail plink

----------------- /usr/local/Modules/3.2.9/modulefiles ----------------------
plink/1.06    plink/1.07    plinkseq/0.08

biowulf% module load plink

biowulf% module list
Currently Loaded Modulefiles:
  1) plink/1.07

The utility FCgene, a format converting tool for genotyped data (e.g. PLINK-MACH, MACH-PLINK) is also available. Type 'module load fcgene' to add the binary to your path, and then 'fcgene' to run it.

Submitting a swarm of Plink jobs

The swarm program is a convenient way to submit large numbers of jobs all at once instead of manually submitting them one by one.

Create a swarm command file along the lines of the one below:

cd /data/$USER/myseqs; plink --noweb --ped file1.ped --map file1.map --assoc
cd /data/$USER/myseqs; plink --noweb --ped file2.ped --map file2.map --assoc
cd /data/$USER/myseqs; plink --noweb --ped file3.ped --map file3.map --assoc
[...etc...]

Submit this swarm with:

swarm -f cmdfile --module plink/1.0.7

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each plink command requires more than 1 GB of memory, you must specify the memory required using the -g # flag to swarm. For example, if each command requires 10 GB of memory, submit the swarm with:

swarm -g 10 -f cmdfile --module plink/1.0.7

For more information regarding running swarm, see swarm.html

Submitting a single Plink job

Single plink jobs would typically be submitted only for debugging purposes.

1. Create a script file which contains the Plink commands as below:

---------- /data/user/plink/run1/script --------------
#!/bin/bash -v
#PBS -N plinkJobName
#PBS -m be
#PBS -k oe

cd /data/$USER/plink/t1
plink --noweb --file test1
plink --noweb --file test1 --freq
plink --noweb --file test1 --assoc
plink --noweb --file test1 --make-bed
----------------- end of script ----------------------

2. Now submit the script using the 'qsub' command, e.g.

qsub -l nodes=1 /data/user/plink/run1/script
Documentation

http://pngu.mgh.harvard.edu/~purcell/plink/