Biowulf at the NIH
RSS Feed
Genome Mapping and Assembly with MAQ on Biowulf

Maq stands for Mapping and Assembly with Quality. It builds assembly by mapping short reads to reference sequences. Maq is a software that builds mapping assemblies from short reads generated by the next-generation sequencing machines. It is particularly designed for Illumina-Solexa 1G Genetic Analyzer, and has preliminary functions to handle ABI SOLiD data.

Maq first aligns reads to reference sequences and then calls the consensus. At the mapping stage, maq performs ungapped alignment. For single-end reads, maq is able to find all hits with up to 2 or 3 mismatches, depending on a command-line option; for paired-end reads, it always finds all paired hits with one of the two reads containing up to 1 mismatch. At the assembling stage, maq calls the consensus based on a statistical model. It calls the base which maximizes the posterior probability and calculates a phred quality at each position along the consensus. Heterozygotes are also called in this process.

Maq is a project hosted by SourceForge.net. The project page is available at http://sourceforge.net/projects/maq/.

 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail maq
---------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------
maq/0.7.1(default)


$ module load maq

$ module list
Currently Loaded Modulefiles:
1) maq/0.7.1 $ module unload maq $ module load maq/0.7.1 $ module show maq ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/maq/0.7.1: module-whatis Sets up maq 0.7.1 prepend-path PATH /usr/local/apps/maq/0.7.1/bin -------------------------------------------------------------------

Sample Session on Biowulf

MAQ sample files can be copied from:

/usr/local/apps/maq/ref.fasta and /usr/local/apps/maq/calib-36.dat.gz. 

Submitting a single MAQ batch job

1. Create a script file alone the following lines:

#!/bin/bash
# This file name is runMAQ
#
#PBS -N MAQ
#PBS -m be
#PBS -k oe

module load maq
cd /home/$USER/maq/run1
maq.pl demo ref.fasta calib-36.dat

2. submit the script using the 'qsub' command, e.g.

qsub -l nodes=1:g8 /home/username/runMAQ

Submitting a swarm of MAQ jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run spontaneously.

Set up a swarm command file (eg /home/username/cmdfile). Here is a sample file:

maq.pl demo ref1.fasta calib-1.dat
maq.pl demo ref2.fasta calib-2.dat
maq.pl demo ref3.fasta calib-3.dat
....
maq.pl demo refN.fasta calib-n.dat

-f: the swarm command file name above (required)
--module: setup environmental variables for each swarm job
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -f cmdfile --module maq

For more information regarding running swarm, see swarm.html

Documentation

http://maq.sourceforge.net/index.shtml