The XHMM C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).
XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.
XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.
How to Use
XHMM uses environment modules.
Create a batch script like this:
#!/bin/bash # # this file is myjob.sh # #PBS -N XHMM #PBS -m be #PBS -o XHMM.o #PBS -e XHMM.e cd $PBS_O_WORKDIR module load XHMM xhmm -p params.txt
A params.txt will need to be created. Here is an example:
1e-8 6 70 -3 1.00 0 1.00 3 1.00
A parameters file consists of the following 9 values (tab-delimited):
- Exome-wide CNV rate
- Mean number of targets in CNV
- Mean distance between targets within CNV (in KB)
- Mean of DELETION z-score distribution
- Standard deviation of DELETION z-score distribution
- Mean of DIPLOID z-score distribution
- Standard deviation of DIPLOID z-score distribution
- Mean of DUPLICATION z-score distribution
- Standard deviation of DUPLICATION z-score distribution
See the documentation below for more information.