Biowulf at the NIH
RSS Feed
XHMM on Biowulf

Description

The XHMM C++ software suite was written to call copy number variation (CNV) from next-generation sequencing projects, where exome capture was used (or targeted sequencing, more generally).

XHMM uses principal component analysis (PCA) normalization and a hidden Markov model (HMM) to detect and genotype copy number variation (CNV) from normalized read-depth data from targeted sequencing experiments.

XHMM was explicitly designed to be used with targeted exome sequencing at high coverage (at least 60x - 100x) using Illumina HiSeq (or similar) sequencing of at least ~50 samples. However, no part of XHMM explicitly requires these particular experimental conditions, just high coverage of genomic regions for many samples.

How to Use

XHMM uses environment modules.

Create a batch script like this:

#!/bin/bash
#
# this file is myjob.sh
#
#PBS -N XHMM
#PBS -m be
#PBS -o XHMM.o
#PBS -e XHMM.e

cd $PBS_O_WORKDIR
module load XHMM
xhmm -p params.txt

A params.txt will need to be created. Here is an example:

1e-8	6	70	-3	1.00	0	1.00	3	1.00

A parameters file consists of the following 9 values (tab-delimited):

  1. Exome-wide CNV rate
  2. Mean number of targets in CNV
  3. Mean distance between targets within CNV (in KB)
  4. Mean of DELETION z-score distribution
  5. Standard deviation of DELETION z-score distribution
  6. Mean of DIPLOID z-score distribution
  7. Standard deviation of DIPLOID z-score distribution
  8. Mean of DUPLICATION z-score distribution
  9. Standard deviation of DUPLICATION z-score distribution

See the documentation below for more information.

Documentation