Biowulf at the NIH
RSS Feed
ChromoPainter on Biowulf

ChromoPainter is a tool for finding haplotypes in sequence data. Each individual is "painted" as a combination of all other sequences. It can output a range of features, including:

It is useful to generate high quality Principal Components Analysis (PCA) from dense data, for creating data summaries for fineSTRUCTURE, for dating admixture events, and much more.

ChromoCombine is a tool to help manage the large number of files generated when running ChromoPainter in parallel on a large number of separate compute nodes. You can place all these files into a single directory and ChromoCombine will calculate the correct way to combine them. It also calculates the effective number of chunks (i.e. the c value).

fineSTRUCTURE is a fast and powerful algorithm for identifying population structure using dense sequencing data. By using the output of ChromoPainter as a (nearly) sufficient summary statistic, it is able to perform model-based Bayesian clustering on large datasets, including full resequencing data, and can handle up to 1000s of individuals. Full assignment uncertainty is given.

The pipeline for every analysis is:
ChromoPainter-> ChromoCombine->fineSTRUCTURE MCMC->fineSTRUCTURE tree creation.

All three programs were developed by Daniel John Lawson, Garrett Hellenthal, Simon Myers, Daniel Falush. ChromoPainter website. fineStructure paper.

They are installed in /usr/local/chromopainter. The executables are in /usr/local/chromopainter/bin. Users should add the following to their .bashrc or .cshrc files:

export PATH=/usr/local/chromopainter/bin:$PATH                   (bash users)
export LD_LIBRARY_PATH=/usr/local/gsl/64/lib:$LD_LIBRARY_PATH

setenv PATH /usr/local/chromopainter/bin:$PATH                   (csh or tcsh users)
setenv LD_LIBRARY_PATH /usr/local/gsl/64/lib:$LD_LIBRARY_PATH

Batch job on Biowulf

Set up a batch script along the following lines:

#PBS -m be

cd /data/username/chromodata
chromopainter -g geno.filein -r reconmap.filein - f donorlist.filein [options]
chromocombine [options] -o output file1 file2 file3....
finestructure [options] datafile initialpopfile > outputfile

Typing 'chromopainter -h', 'chromocombine', or 'finestructure' on the command line will print usage info, including options and required inputs, to the screen.

Running a swarm of Chromopainter jobs

Set up a swarm command file along the following lines:

# --- this file is called chromo.swarm --------------------------
chromopainter -g geno1.filein -r reconmap1.filein - f donorlist1.filein [options]
chromopainter -g geno2.filein -r reconmap2.filein - f donorlist2.filein [options]
chromopainter -g geno3.filein -r reconmap3.filein - f donorlist3.filein [options]

Submit this with
swarm -f chromo.swarm