Biowulf at the NIH
RSS Feed
align2rawsignal on Biowulf

align2rawsignal reads in a set of tagAlign/BAM files, filters out multi-mapping tags and creates a consolidated genome-wide signal file using variou s tag-shift and smoothing parameters as well as various normalization schemes

The method accounts for the following sources of variation

Several types of normalization are implemented. (See usage below)

This tool is primarily used with the following kinds of functional sequencing data

Align2rawsignal was developed by Anshul Kundaje. [align2rawsignal website].

Important Notes

1. There are several environment variables required for align2rawsignal to run correctly. It is simplest to use the command 'module load align2rawsignal' to set up the environment.

2. The file name prefixes (excluding the extensions) in the mappability directory and the sequence directory must be identical. Thus, you cannot directly use the Biowulf genome files, since there are other unrelated files in those directories. Instead, you should create your own sequence directory. To avoid storing redundant copies of the same data, you can simply link to the appropriate files in the Biowulf genome directory.

For example:

biowulf% cd mydir
biowulf% ls myinput
chr1.uniq   chr13.uniq  chr17.uniq  chr3.uniq  chr7.uniq  chrX.uniq
chr10.uniq  chr14.uniq  chr18.uniq  chr4.uniq  chr8.uniq  chrY.uniq
chr11.uniq  chr15.uniq  chr19.uniq  chr5.uniq  chr9.uniq
chr12.uniq  chr16.uniq  chr2.uniq   chr6.uniq  chrM.uniq

biowulf% mkdir mm9

biowulf% cd mm9

biowulf% ln -s /fdb/genome/mm9/chr?.fa .

biowulf% ln -s /fdb/genome/mm9/chr??.fa .

biowulf% ls -l
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr10.fa -> /fdb/genome/mm9/chr10.fa
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr11.fa -> /fdb/genome/mm9/chr11.fa
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr12.fa -> /fdb/genome/mm9/chr12.fa
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr13.fa -> /fdb/genome/mm9/chr13.fa
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr14.fa -> /fdb/genome/mm9/chr14.fa
lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr15.fa -> /fdb/genome/mm9/chr15.fa

Check that your mm9 directory has the same filenames as your input directory -- no more, no less. Now, use your own 'mm9' directory as the input for the -s parameter in align2rawsignal

Running an align2rawsignal job on Biowulf

The command 'module load align2rawsignal' will set up the paths for the align2rawsignal binary, the library paths for the Matlab compiler versions, and will set MCR_CACHE_ROOT and TMP to /scratch. Set up a batch script along the following lines:

# this file is myjob.bat

cd /data/$USER/mydir
module load align2rawsignal
align2rawsignal -i=file1.bam -s=myhg19  -u=/data/$USER/binmap \
    -o=/data/$USER/mydir/align2rawsignal.out -of=bg -m=20

Submit this job with:

qsub -l nodes=1:g24:c16 myjob.bat

The align2rawsignal parameter '-m 20' specifies that the program should use a max of 20 GB of memory. Thus, it has been submitted to a 'g24' node which has about 22 GB of usable memory. You will need to do some trial runs to estimate the most appropriate number for the -m parameter.



There is also a README file in each version of /usr/local/apps/align2rawsignal.