Description
ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others).
How to Use
There are multiple versions of ANNOVAR available. An easy way of selecting the version is to use modules. To see the modules available, type
module avail annovarTo select a module, type
module load annovar/[ver]where [ver] is the version of choice. This will set your $PATH variable, as well as $ANNOVAR_HOME and $ANNOVAR_DATA.
ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosome, start position, end position, the reference nucleotides and the observed nucleotides. Here is the example file $ANNOVAR_HOME/example/ex1.human
1 161003087 161003087 C T comments: rs1000050, ... 1 84647761 84647761 C T comments: rs6576700 ... 1 13133880 13133881 TC - comments: rs59770105, ... 1 11326183 11326183 - AT comments: rs35561142, ... 1 105293754 105293754 A ATAAA comments: rs10552169, ... 1 67478546 67478546 G A comments: rs11209026 ... 2 233848107 233848107 A G comments: rs2241880 ... 16 49303427 49303427 C T comments: rs2066844 ... 16 49314041 49314041 G C comments: rs2066845 ... 16 49321279 49321279 - C comments: rs2066847 ... 13 19661686 19661686 G - comments: rs1801002 ... 13 19695176 20003944 0 - comments: a 342kb ...Reference files are pre-installed in $ANNOVAR_DATA/{build}, where {build} can be either hg18 or hg19. If other builds are needed, contact staff@helix.nih.gov.
At the command line, type
[helix]$ cp $ANNOVAR_HOME/example/ex1.human . [helix]$ annotate_variation.pl --geneanno --dbtype refGene ex1.human $ANNOVAR_DATA/hg18There is a custom script, written by the Helix staff, that can annotate against multiple dbtypes simultaneously, and output the results into a single, tab-delimited file.
multi_db.pl usage: multi_db.pl --input file --output file [ options ] Run annovar against multiple dbs. Currently the build is only hg19. Annovar is run against all these dbtypes, and a tab-delimited file containing all the results is written. options: --input input file for annovar --output output file --threads run with more than one thread --autothread run with as many threads as are available on the node --dbtype pick and choose dbtypes (can be more than one) not selecting a dbtype defaults to all dbtypes (see above for list) --data use a different data dir (default = $ANNOVAR_DATA/hg19) --scratch scratch dir where work is done (default = /scratch) -h, --help show this menu -d, --debug run in debug modeHere is an example run:
multi_db_test.pl --input justChr1.data --output multi.out --threads 8 --dbtype avsift --dbtype esp6500_aa --dbtype bandFor more information on multi_db.pl, contact staff@helix.nih.gov.
Cluster Use
The easiest way to run ANNOVAR with multiple VCF files is via swarm. Create a file containing these lines:
convert2annovar -format vcf4 file1.vcf > file1.inp; multi_db.pl --input file1.inp --output file1.out --autothread convert2annovar -format vcf4 file2.vcf > file2.inp; multi_db.pl --input file2.inp --output file2.out --autothread convert2annovar -format vcf4 file3.vcf > file3.inp; multi_db.pl --input file3.inp --output file3.out --autothread convert2annovar -format vcf4 file4.vcf > file4.inp; multi_db.pl --input file4.inp --output file4.out --autothreadThen submit with the --module and -t auto options:
swarm -f swarmfile --module annovar -t autoThis will allocate one annovar job per node, using all the processors on that node to simultaneously annotate the VCF file against all the standard databases.


