Biowulf at the NIH
RSS Feed
ANNOVAR

Description

ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, as well as mouse, worm, fly, yeast and many others).

Citation

If you use ANNOVAR, please cite:

How to Use

There are multiple versions of ANNOVAR available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail annovar

To select a module, type

module load annovar/[ver]

where [ver] is the version of choice. This will set your $PATH variable, as well as $ANNOVAR_HOME and $ANNOVAR_DATA.

ANNOVAR takes text-based input files, where each line corresponds to one variant. On each line, the first five space- or tab- delimited columns represent chromosome, start position, end position, the reference nucleotides and the observed nucleotides. Here is the example file $ANNOVAR_HOME/example/ex1.human

1	161003087	161003087	C	T	comments: rs1000050, ...
1	84647761	84647761	C	T	comments: rs6576700 ...
1	13133880	13133881	TC	-	comments: rs59770105, ...
1	11326183	11326183	-	AT	comments: rs35561142, ...
1	105293754	105293754	A	ATAAA	comments: rs10552169, ...
1	67478546	67478546	G	A	comments: rs11209026 ...
2	233848107	233848107	A	G	comments: rs2241880 ...
16	49303427	49303427	C	T	comments: rs2066844 ...
16	49314041	49314041	G	C	comments: rs2066845 ...
16	49321279	49321279	-	C	comments: rs2066847 ...
13	19661686	19661686	G	-	comments: rs1801002 ...
13	19695176	20003944	0	-	comments: a 342kb ...

Reference files are pre-installed in $ANNOVAR_DATA/{build}, where {build} can be either hg18 or hg19. If other builds are needed, contact staff@helix.nih.gov.

At the command line, type

[helix]$ cp $ANNOVAR_HOME/example/ex1.human .
[helix]$ annotate_variation.pl --geneanno --dbtype refGene ex1.human $ANNOVAR_DATA/hg18

There is a custom script, written by the Helix staff, that can annotate against multiple dbtypes simultaneously, and output the results into a single, tab-delimited file.

multi_db.pl 

usage: multi_db.pl --input file --output file [ options ]

Run annovar against multiple dbs.  Currently the build is only hg19.
Annovar is run against all these dbtypes, and a tab-delimited file containing
all the results is written.
    
options:

  --input      input file for annovar
  --output     output file
  --threads    run with more than one thread
  --autothread run with as many threads as are available on the node
  --dbtype     pick and choose dbtypes (can be more than one)
                 not selecting a dbtype defaults to all dbtypes
                 (see above for list)
  --data       use a different data dir
                 (default = $ANNOVAR_DATA/hg19)
  --scratch    scratch dir where work is done
                 (default = /scratch)
  -h, --help   show this menu
  -d, --debug  run in debug mode

Here is an example run:

multi_db_test.pl --input justChr1.data --output multi.out --threads 8 --dbtype avsift --dbtype esp6500_aa --dbtype band

For more information on multi_db.pl, contact staff@helix.nih.gov.

Cluster Use

The easiest way to run ANNOVAR with multiple VCF files is via swarm. Create a file containing these lines:

convert2annovar.pl -format vcf4 file1.vcf > file1.inp; multi_db.pl --input file1.inp --output file1.out --autothread
convert2annovar.pl -format vcf4 file2.vcf > file2.inp; multi_db.pl --input file2.inp --output file2.out --autothread
convert2annovar.pl -format vcf4 file3.vcf > file3.inp; multi_db.pl --input file3.inp --output file3.out --autothread
convert2annovar.pl -format vcf4 file4.vcf > file4.inp; multi_db.pl --input file4.inp --output file4.out --autothread

Then submit with the --module and -t auto options:

swarm -f swarmfile --module annovar -t auto

This will allocate one annovar job per node, using all the processors on that node to simultaneously annotate the VCF file against all the standard databases.

Notes On Reference Files

Some of the reference files for ANNOVAR are updated on a regular basis. The environment variable $ANNOVAR_DATA is set to the reference files as they existed at the time that ANNOVAR was updated. As a consequence, some of the reference files are not current. In order to use the most current, up-to-date reference files for ANNOVAR, use /fdb/annovar/current as the base directory for reference files. For example,

annotate_variation.pl --geneanno --dbtype refGene ex1.human /fdb/annovar/current/hg19

Alternatively, the environment variable $ANNOVAR_DATA_CURRENT can be used instead:

annotate_variation.pl --geneanno --dbtype refGene ex1.human $ANNOVAR_DATA_CURRENT/hg19

Please note that the reference files in /fdb/annovar/current are subject to change. This means that identical ANNOVAR jobs run on different days may give different results. For more information, contact staff@helix.nih.gov.

Documentation