Biowulf at the NIH
RSS Feed
Plink/Seq on Biowulf

PLINK/SEQ is an open-source C/C++ library for working with human genetic variation data. The specific focus is to provide a platform for analytic tool development for variation data from large-scale resequencing projects, particularly whole-exome and whole-genome studies. However, the library could in principle be applied to other types of genetic studies, including whole-genome association studies of common SNPs.

Plink/Seq was developed at Harvard University

Programs Location

/usr/local/plinkseq

Submitting a single batch job

Create a script file similar to the one below. The plinkseq executables are added to your PATH by including the 'module load plinkseq' command in your script file.

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N plinkseq
#PBS -m be
#PBS -k oe

module load plinkseq

cd /data/user/somewhereWithInputFile
pseq ex1.vcf v-view --vmeta --gmeta

2. Submit the script using the 'qsub' command on Biowulf.

[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

module load plinkseq; pseq ex1.vcf v-view --vmeta --gmeta
module load plinkseq; pseq ex2.vcf v-view --vmeta --gmeta
module load plinkseq; pseq ex3.vcf v-view --vmeta --gmeta
module load plinkseq; pseq ex4.vcf v-view --vmeta --gmeta
[... etc...]

Submit this swarm of jobs with:

swarm -f cmdfile

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each of your Plinkseq commands will require more than 1 GB of memory, you must specify the required memory using the -g flag to swarm. e.g. if each command requires 5 GB of memory, you would submit with:

swarm -g 5 -f cmdfile

For more information regarding running swarm, see swarm.html

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
      
[user@p4]$ module load plinkseq
        
[user@p4]$ cd /data/userID/plinkseq/run1
        
[user@p4]$ pseq ex4.vcf v-view --vmeta --gmeta

chr1:1001  rs1001    T/C   .   1   PASS   .   VM=1;SM=100
  P001  1  C/C [GM=1]
  P002  1  T/T [GM=2]
  P003  1  T/C [GM=3]
  P004  1  C/C [GM=4]
[...etc...]

[user@p4] exit
qsub: job 2236960.biobos completed

[user@biowulf]$ 

Users may add property of node in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run a job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g24

Documentation

http://atgu.mgh.harvard.edu/plinkseq/