Biowulf at the NIH
RSS Feed
MACS on Biowulf

Model-based Analysis of ChIP-Seq (MACS) is used on short reads sequencers such as Genome Analyzer (Illumina / Solexa). MACS empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. MACS compares favorably to existing ChIP-Seq peak-finding algorithms and can be used for ChIP-Seq with or without control samples.

MACS is written by Yong Zhang and Tao Liu from Xiaole Shirley Liu's Lab.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail macs
---------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------
macs/1.4.1           macs/1.4.2           macs/2.0.10(default)


$ module load macs

$ module list
Currently Loaded Modulefiles:
1) macs/2.0.10 $ module unload macs $ module load macs/1.4.2 $ module show macs ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/macs/2.0.10: module-whatis Sets up macs 2.0.10 prepend-path PYTHONPATH /usr/local/Python/2.7.2/lib/python2.7/site-packages prepend-path PATH /usr/local/Python/2.7.2/bin -------------------------------------------------------------------
Submitting a single batch job

Sample file FoxA1_ChIP-seq.tar.gz can be copied from

/usr/local/apps/macs/FoxA1_ChIP-seq.tar.gz

1. Copy the sample file into your own directory.:

$ mkdir /data/$USER/macs/run1
$ cd /data/$USER/macs/run1 
$ cp /usr/local/apps/macs/FoxA1_ChIP-seq.tar.gz .

2. Create a script file alone the following lines:

 
#!/bin/bash
# This file is runMacs
#
#PBS -N Macs
#PBS -m be
#PBS -k oe

# load the latest (default) version of MACS
module load macs   
cd /data/$USER/macs/run1
macs2 callpeak -t Treatment_tags.bed -c Input_tags.bed --name test

3. Submit the script using the 'qsub' command on Biowulf.

$ qsub -l nodes=1 /data/username/runMacs

MACS is single-threaded, so the default qsub submission is sufficient. There is no advantage to submitting to a node with more cores, since MACS will always run only a single thread. If your MACS jobs needs more than 1 GB of memory, you may want to submit to a larger-memory node. Use 'freen' to see the types of nodes available.

Submitting a swarm of macs jobs

1. Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

macs2 callpeak -t file1.bed -c Input_tags.bed --name test1
macs2 callpeak -t file2.bed -c Input_tags.bed --name test2
macs2 callpeak-t file3.bed -c Input_tags.bed --name test3
macs2 callpeak-t file4.bed -c Input_tags.bed --name test4

Submit this swarm with

$ swarm -f cmdfile --module macs

-f: name of the swarm file
--module: setup environmental variables for each macs job.

If each macs process (one line in the swarm file above) requires more than 1 GB of memory, you should specify this to swarm using the -g flag. e.g. if each macs run requires 4 GB RAM, use

$ swarm -g 4 -f cmdfile --module macs

For more information regarding running swarm, see swarm.html

Documentation

https://github.com/taoliu/MACS/blob/master/README.rst