Multifactor dimensionality reduction (MDR) has been previously introduced as a non-parametric, model free method for detecting gene-gene and gene-environment interactions. Parallel multifactor dimensionality reduction (pMDR) is a new implementation of the MDR algorithm that can scale to handle extremely large data sets, dramatically decreases single-processor runtimes, and can also use a parallel software framework to allow operation in a clustered computing environment to further reduce runtime.
The improved algorithm of pMDR allows for an unlimited number of variable states (for haplotype encoding) and an unlimited number of individuals. The number of variables and the order interaction to analyze (2 locus interactions, 3 locus interactions, etc) are limited only by machine memory and computation time. These improvements allow the analysis of higher order interactions for small datasets and make two-locus interactions computationally feasible for very large datasets.
pMDR was developed in the Ritchie Lab.
This example uses the sample files provided with the pMDR program. They can be copied from /usr/local/mdr/example/. The appropriate OpenMPI version and pMDR are set up using the 'module load pMDR' command as in the example below.
Set up a batch script along the following lines:
#!/bin/bash # # this file is myjob.sh # #PBS -N pMDRjob #PBS -m be #PBS -k oe # cd /data/username/mydir module load pMDR `which mpirun` -machinefile $PBS_NODEFILE -np $np `which pMDR` xor.cfg
Submit this job to the batch system with:
qsub -v np=4 -l nodes=2:c2 myjob.sh (to run on 4 CPUs on 2 single core nodes) or qsub -v np=8 -l nodes=2:c4 myjob.sh (to run on 8 CPUs on 2 dual core nodes) etc.
The batch system will send email when the job starts and ends ('#PBS -m be' in the batch script). The standard output and error files will appear in the user's home directory.
ls -l /home/user/pMDR* -rw------- 1 user user 0 Jun 8 11:55 pMDRjob.e1330139 -rw------- 1 user user 241 Jun 8 11:55 pMDRjob.o1330139
If the job runs with no errors, the standard error file (in this case, pMDRjob.e1330139) should have size 0.
As always, jobs should be monitored with 'jobload username', or one of the other monitoring tools. If you run a large pMDR job and submit it to multiple nodes, be sure that the job loads are over 80%. Otherwise, you are wasting processors. If you have any questions about the efficiency of your pMDR job or how many nodes to use, please contact the Helix/Biowulf staff at firstname.lastname@example.org.