Biowulf at the NIH
RSS Feed
Mosaik on Biowulf
Mosaik logo


MOSAIK is a reference-guided assembler comprising of four main modular programs:

  • MosaikBuild
  • MosaikAligner
  • MosaikSort
  • MosaikAssembler

MosaikBuild converts various sequence formats into the Mosaik native read format. MosaikAligner pairwise aligns each read to a specified series of reference sequences. MosaikSort resolves paired-end reads and sorts the alignments by the reference sequence coordinates. Finally, MosaikAssembler parses the sorted alignment archive and produces a multiple sequence alignment which is then saved into an assembly file format.

The MOSAIK suite was written by Michael Strömberg of the Marth lab at Boston College.

How To Use

There are multiple versions of Mosaik available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail mosaik

To select a module, type

module load mosaik/[ver]

where [ver] is the version of choice. This will set your $PATH variable.

As an example, create a batch script to run the commands to align reads to a chromosome:

#----- This file is Mosaik.bat -----#
#PBS -mbe
#PBS -N Mosaik
#PBS -e Mosaik.err
#PBS -o Mosaik.out

# Set the environment using mosaik module
module load mosaik

# Set the temporary directory
export MOSAIK_TMP=/scratch

# Build the Mosaik .dat file for reads
MosaikBuild -fr myreads.fasta -fq myreads.fasta.qual -out myreads.dat

# Build the Mosaik .dat file for the reference chromosome
MosaikBuild -fr myreference.fasta -oa myreference.dat

# Align the reads to the reference chromosome using 8 processors
MosaikAligner -in myreads.dat -out myreads_aligned.dat -ia myreference.dat -hs 15 -mm 4 -m all -mhp 100 -act 20 -j myjumpdb -p 8

The -p option sets the number of CPUs to use during execution. This batch script uses -p 8, which requires the batch script to be submitted to a 8CPU node or higher:

qsub -l nodes=1:c16 Mosaik.bat