Velvet is a de novo genomic assembler specially designed for short read sequencing technologies, such as Solexa or 454, developed by Daniel Zerbino and Ewan Birney at the EBI. [Velvet website]
Velvet currently takes in short read sequences, removes errors then produces high quality unique contigs. It then uses paired-end read and long read information, when available, to retrieve the repeated areas between contigs.
Velvet is also available on Helix. The advantage of running on Biowulf would be to run several simultaneous Velvet jobs, or jobs that require more than 32 GB of RAM.
Set up a swarm command file along the lines of the following:
# --- this is file swarmfile ---- cd /data/user/dir1; velveth . 21 -shortPaired file1.fasta >& out1 cd /data/user/dir1; velveth . 21 -shortPaired file2.fasta >& out2 cd /data/user/dir1; velveth . 21 -shortPaired file3.fasta >& out3 [...]
Submit this swarm with
swarm -g # -f swarmfile --module velvet
'--module velvet' will tell swarm to load the 'velvet' module before each job.
Simon Gladman has a memory estimator for Velvet.
Memory = -109635 + 18977*ReadSize + 86326*GenomeSize + 233353*NumReads - 51092*K
Gives the answer in kb. divide by 1048576 to get Gb.
Read size is in bases.
Genome size is in millions of bases (Mb)
Number of reads is in millions
K is the kmer hash value used in velveth
For example, for k = 31, Number of reads = 50 million, read size = 36 and genome size of 5 Megabases, the estimator returns ~10.5 Gbytes of RAM required. You would set up a swarm command file and submit it with
swarm -g 11 -f swarmfile