Biowulf at the NIH
RSS Feed
SAMtools on Biowulf

SAM Tools provide various utilities for manipulating alignments in the SAM format, including sorting, merging, indexing and generating alignments in a per-position format.

Programs Location


You can add the samtools, bcftools and associated misc tools to your path most easily by using the modules commands, as in the example below:

[user@biowulf]$ module avail samtools                   (see what versions are available)

------------------- /usr/local/Modules/3.2.9/modulefiles -------------------
samtools/0.1.12a         samtools/0.1.15          samtools/0.1.18(default)
samtools/0.1.13          samtools/0.1.17

[user@biowulf]$ module load samtools                     (load the default version)

[user@biowulf]$ module list                              (see what version is loaded)
Currently Loaded Modulefiles:
  1) samtools/0.1.18

[user@biowulf]$ module unload samtools                   (unload this version)

[user@biowulf]$ module load samtools/0.1.15              (load a specific version)

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) samtools/0.1.15

Submitting a single SAMtools batch job

Samtools sample files can be copied from /usr/local/src/samtools/: ex1.fa; ex1.sam.gz

1. Copy the same files to your own area.

mkdir /data/user/samtools/run1; cd /data/user/samtools/run1 ; cp /usr/local/src/samtools/ex1.* .; tar xvfz ex1.sam.gz

3. Create a script file similar to the one below:

# This file is runSamtools
#PBS -N Samtools
#PBS -m be
#PBS -k oe

module load samtools

cd /home/user/samtools/run1
samtools faidx ex1.fa
samtools import ex1.fa.fai ex1.sam.gz ex1.bam
samtools index ex1.bam
samtools tview ex1.bam ex1.fa
samtools pileup -cf ex1.fa ex1.bam

4. Submit the script using the 'qsub' command on Biowulf. In this example, job was submitted to g8 node which has 8 GB of memory. User can also type 'freen' on Biowulf head node to see availabe node types based on your need:

qsub -l nodes=1:g8 /data/username/runSamtools

Submitting a swarm of Samtools jobs

1. Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/samtools/run/ex1; \
  samtools faidx ex1.fa; \
  samtools import ex1.fa.fai ex1.sam.gz ex1.bam; \
  samtools index ex1.bam ; \
  samtools tview ex1.bam ex1.fa ; \
  samtools pileup -cf ex1.fa ex1.bam
cd /data/user/samtools/run/ex2; \
  samtools faidx ex2.fa; \
  samtools import ex2.fa.fai ex2.sam.gz ex2.bam; \
  samtools index ex2.bam ; \
  samtools tview ex2.bam ex2.fa ; \
  samtools pileup -cf ex2.fa ex2.bam
cd /data/user/samtools/run/ex3; \
  samtools faidx ex3.fa; \
  samtools import ex3.fa.fai ex3.sam.gz ex3.bam; \
  samtools index ex3.bam ; \
  samtools tview ex3.bam ex3.fa ; \
  samtools pileup -cf ex3.fa ex3.bam
cd /data/user/samtools/run/ex4; \
  samtools faidx ex4.fa; \
  samtools import ex4.fa.fai ex4.sam.gz ex4.bam; \
  samtools index ex4.bam ; \
  samtools tview ex4.bam ex4.fa ; \
  samtools pileup -cf ex4.fa ex4.bam

Submit this swarm with:

swarm -f cmdfile --module samtools

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory.

If each line of the commands above will need to use more than 1 GB of memory, say for example 4 GB, make sure swarm understands this by including '-g 4' flag:

swarm -g 4 -f cmdfile --module samtools

For more information regarding running swarm, see swarm.html