Biowulf at the NIH
RSS Feed
Germline on Biowulf

GERMLINE is a program for discovering long shared segments of Identity by Descent (IBD) between pairs of individuals in a large population. It takes as input genotype or haplotype marker data for individuals (as well as an optional known pedigree) and generates a list of all pairwise segmental sharing.

GERMLINE uses a novel hashing & extension algorithm which allows for segment identification in haplotype data in time proportional to the number of individuals. Presently, GERMLINE can execute on phased or un-phased data; though we have found performance much improved with phasing while phasing & running GERMLINE is still significantly faster than comparable IBD algorithms. GERMLINE can identify shared segments of any specified length, as well as allow for any number of mismatching markers.

GERMLINE was developed in Itsik Pe'er's lab at Columbia University. GERMLINE website.

GERMLINE is not a parallel program. Single GERMLINE jobs should be run interactively on the Biowulf interactive nodes or Helix. If you have multiple GERMLINE jobs to run, the swarm utility is recommended.

Submitting a swarm of GERMLINE jobs

The swarm program is a convenient way to submit large numbers of jobs all at once instead of manually submitting them one by one.

1. For each set of input data, create a file which contains the germline commands as below.

-----------/data/user/myproject/run1 ----------
1
/data/user/myproject/CEU.22.map
/data/user/myproject/CEU.22.ped
/data/user/myproject/generated
-------------------------------------------------

2. Now prepare the swarm command file (named cmdfile below), e.g.

------- cmdfile -------------
germline -bits 50 -min_m 1 -err_hom 2 -err_het 0 < /data/user/myproject/run1
germline -bits 50 -min_m 1 -err_hom 2 -err_het 0 < /data/user/myproject/run2
germline -bits 50 -min_m 1 -err_hom 2 -err_het 0 < /data/user/myproject/run3
.....
....
---- end of cmdfile ---------

3. If each germline process requires less than 1 GB of memory, submit this to the batch system with the command:

swarm -f cmdfile

If each germline process requires more than 1 GB of memory, use

swarm -g # -f cmdfile
where '#' is the number of Gigabytes of memory required by each germline process.

Submit a single GERMLINE batch job

Single Germline jobs would typically be submitted only for debugging purposes.

1. Create a script file which contains the GERMLINE commands as below:

---------- /data/user/plink/run1/script --------------
#!/bin/csh -v
#PBS -N germline
#PBS -m be
#PBS -k oe
cd /data/user/myproject/
germline -bits 50 -min_m 1 -err_hom 2 -err_het 0 <<EOF
1
/data/user/myproject/CEU.22.map
/data/user/myproject/CEU.22.ped
/data/user/myproject/generated
EOF
----------------- end of script ----------------------

2. Now submit the script using the 'qsub' command, e.g.

qsub -l nodes=1 /data/user/plink/run1/script
Documentation

Germline website and documentation.