align2rawsignal reads in a set of tagAlign/BAM files, filters out multi-mapping tags and creates a consolidated genome-wide signal file using variou s tag-shift and smoothing parameters as well as various normalization schemes
The method accounts for the following sources of variation
- the mappabilty of the genome (based on read length and ambiguous bases)
- differentiates between positions that shown 0 signal simply because they are unmappable vs positions that are mappable by have no reads. The former are not represented in the output wiggle or bedgraph files while the latter are represented as 0s.
- different tag shifts for the different datasets being combined
- depth of sequencing
- ssequence bias (yet to be implemented)
- local input/control correction (yet to be implemented)
Several types of normalization are implemented. (See usage below)
This tool is primarily used with the following kinds of functional sequencing data
- TF and histone ChIP-seq
- DNase and FAIRE-seq
- MNase-seq for nucleosome positioning
1. There are several environment variables required for align2rawsignal to run correctly. It is simplest to use the command 'module load align2rawsignal' to set up the environment.
2. The file name prefixes (excluding the extensions) in the mappability directory and the sequence directory must be identical. Thus, you cannot directly use the Biowulf genome files, since there are other unrelated files in those directories. Instead, you should create your own sequence directory. To avoid storing redundant copies of the same data, you can simply link to the appropriate files in the Biowulf genome directory.
biowulf% cd mydir biowulf% ls myinput chr1.uniq chr13.uniq chr17.uniq chr3.uniq chr7.uniq chrX.uniq chr10.uniq chr14.uniq chr18.uniq chr4.uniq chr8.uniq chrY.uniq chr11.uniq chr15.uniq chr19.uniq chr5.uniq chr9.uniq chr12.uniq chr16.uniq chr2.uniq chr6.uniq chrM.uniq biowulf% mkdir mm9 biowulf% cd mm9 biowulf% ln -s /fdb/genome/mm9/chr?.fa . biowulf% ln -s /fdb/genome/mm9/chr??.fa . biowulf% ls -l lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr10.fa -> /fdb/genome/mm9/chr10.fa lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr11.fa -> /fdb/genome/mm9/chr11.fa lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr12.fa -> /fdb/genome/mm9/chr12.fa lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr13.fa -> /fdb/genome/mm9/chr13.fa lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr14.fa -> /fdb/genome/mm9/chr14.fa lrwxrwxrwx 1 user user 24 Jun 10 14:03 chr15.fa -> /fdb/genome/mm9/chr15.fa etc.
Check that your mm9 directory has the same filenames as your input directory -- no more, no less. Now, use your own 'mm9' directory as the input for the -s parameter in align2rawsignal
The command 'module load align2rawsignal' will set up the paths for the align2rawsignal binary, the library paths for the Matlab compiler versions, and will set MCR_CACHE_ROOT and TMP to /scratch. Set up a batch script along the following lines:
#!/bin/bash # # this file is myjob.bat cd /data/$USER/mydir module load align2rawsignal align2rawsignal -i=file1.bam -s=myhg19 -u=/data/$USER/binmap \ -o=/data/$USER/mydir/align2rawsignal.out -of=bg -m=20
Submit this job with:
qsub -l nodes=1:g24:c16 myjob.bat
The align2rawsignal parameter '-m 20' specifies that the program should use a max of 20 GB of memory. Thus, it has been submitted to a 'g24' node which has about 22 GB of usable memory. You will need to do some trial runs to estimate the most appropriate number for the -m parameter.
There is also a README file in each version of /usr/local/apps/align2rawsignal.