Biowulf at the NIH
RSS Feed
Velvet on Biowulf

Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the EBI. [Velvet website]

Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.

Velvet is also available on Helix. The advantage of running on Biowulf would be to run several simultaneous Velvet jobs, or jobs that require more than 32 GB of RAM.

Running a swarm of Velvet jobs

Set up a swarm command file along the lines of the following:

# --- this is file swarmfile ----
cd /data/user/dir1; velveth . 21 -shortPaired file1.fasta >& out1
cd /data/user/dir1; velveth . 21 -shortPaired file2.fasta >& out2
cd /data/user/dir1; velveth . 21 -shortPaired file3.fasta >& out3
[...]

Submit this swarm with

swarm -g # -f swarmfile --module velvet
where '#' is the number of Gigabytes required by a single Velvet process.
'--module velvet' will tell swarm to load the 'velvet' module before each job.

How to estimate memory usage for a Velvet process

Simon Gladman has a memory estimator for Velvet.

Memory = -109635 + 18977*ReadSize + 86326*GenomeSize + 233353*NumReads - 51092*K

Gives the answer in kb. divide by 1048576 to get Gb.
Read size is in bases.
Genome size is in millions of bases (Mb)
Number of reads is in millions
K is the kmer hash value used in velveth

For example, for k = 31, Number of reads = 50 million, read size = 36 and genome size of 5 Megabases, the estimator returns ~10.5 Gbytes of RAM required. You would set up a swarm command file and submit it with

swarm -g 11 -f swarmfile

Documentation

Velvet website
Velvet manual (PDF)
Velvet on Helix