Biowulf at the NIH
RSS Feed
ShapeIT on Biowulf

Segmented HAPlotype Estimation and Imputation Tool (SHAPEIT) is a fast and accurate haplotype inference software

- Linear complexity with the number of SNPs/individuals in the sample.
- Linear complexity with the number of conditioning haplotypes used in each update step.
- Whole chromosome of GWAS scale dataset can be phased in a single run.
- Mixed samples of Trios, Duos and Unrelateds are handled.
- Phasing is multi-threaded to decrease computational time on multi-core computers.

SHAPEIT was developed in C++ by Olivier Delaneau (olivier.delaneau at gmail.com) under the supervision Jean-Francois Zagury. Additional versions are being developed with the co-supervision of Jonathan Marchini.

The paper to cite if SHAPEIT is used:
O. Delaneau, J. Marchini, JF. Zagury. A linear complexity phasing method for thousands of genomes. Nature Methods 2011 (To appear).

The easiest way to select a version of SHAPEIT is to use the modules commands, as in the example below:

biowulf% module avail shapeit

------------ /usr/local/Modules/3.2.9/modulefiles -----------------
shapeit/1.r416      shapeit/2.r644

biowulf% module load shapeit

biowulf% module list
Currently Loaded Modulefiles:
  1) shapeit/2.r644

biowulf% module unload shapeit

biowulf% module load shapeit/1.r416 

biowulf% module list
Currently Loaded Modulefiles:
  1) shapeit/1.r416

Submitting a single batch job
The example files used on this page are those that are provided with SHAPEIT. You can copy them from the system area using
mkdir /data/$USER/shapeit_example
cp -r /usr/local/apps/shapeit/2.r644/example /data/$USER/shapeit_example

1. Create a script file along the lines of the one below:

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load shapeit/2.r644

cd /data/user/somewhereWithInputFile
shapeit --threads 4 \
   --input-bed chr20.unphased.bed chr20.unphased.bim chr20.unphased.fam \
   --input-map chr20.gmap.gz --output-max chr20.phased.haps chr20.phased.sample

2. Submit the script using the 'qsub' command on Biowulf.

[user@biowulf]$ qsub -l nodes=1:c4 /data/username/theScriptFileAbove

In the script above, shapeit is being run with '--threads 4'. The job must be submitted to a node with 4 cores (c4). If you change the number of threads in your script, you will need to correspondingly change the type of node to which you submit. Run 'freen' to see the different types of nodes.

Freen shows that the nodes with 4 cores have 8 GB of memory. If you need more memory, you will need to submit to a different kind of node.

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

shapeit -B chr1.unphased -M chr1.gmap.gz -O chr1.phased -T 8
shapeit -B chr2.unphased -M chr2.gmap.gz -O chr2.phased -T 8
shapeit -B chr3.unphased -M chr3.gmap.gz -O chr3.phased -T 8
shapeit -B chr4.unphased -M chr4.gmap.gz -O chr4.phased -T 8

In this swarm command file, note that each shapeit command is being run with 8 threads (-T 8). If each command requires 4 GB of memory, Submit the swarm with

swarm -t 8 -g 4 cmdfile --module shapeit

The number of threads (-t 8) in the swarm command must match the number of threads specified in the shapeit command.
'-g 4' indicates that each command requires 4 GB of memory.
'--module shapeit' tells swarm that the shapeit module must be loaded for each job.

For more information regarding running swarm, see swarm.html

 

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load shapeit
[user@p4]$ cd /data/$USER/mydir
[user@p4]$ shapeit -B chr20.unphased -M chr20.gmap.gz -T 8 --output-graph chr20.phased.hgraph
[user@p4]$ shapeit -convert --input-graph chr20.phased.graph -O chr20.phased.max
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

Users may add a node property in the qsub command to request a specific type of interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g24

 

Documentation

http://www.shapeit.fr/