Biowulf at the NIH
RSS Feed
BioScope on Biowulf

SOLiD Bioscope (from Applied Biosystems) provides a command line interface for running application-specific sequence analysis tools. The Bioscope framework enables the user to perform off-instrument secondary and tertiary analyses, and it allows configurable bioinformatics workflows for resequencing (mapping, SNP finding (diBayes), copy number variations, inversions, small indels, large indels) and whole transcriptome analysis (mapping, counting, novel transcript finding, UCSC WIG Files creation) Results will be in GFF v3 and SAM formats. The resulting industry-standard files from Bioscope can be used with third-party visualization and analysis software tools.

How To Run

1. Contact staff@helix.nih.gov so that we can properly setup disk space needed for you. Assume the space is called /data/user in the following example.

2. The following instructions are based on the example dataset from Applied Biosystems. This data is available on Biowulf in /usr/local/bioscope-1.3.1/BioScope-1.3.rBS130-51653_20101021190735.examples.tar.gz. Unzip and untar the example tarfile into your own /data area

$ mkdir /data/user/run1 $ cd /data/user/run1 $ cp /usr/local/bioscope-1.3.1/BioScope-1.3.rBS130-51653_20101021190735.examples.tar.gz $ tar xvfz BioScope-1.3.rBS130-51653_20101021190735.examples.tar.gz

3. login to the Biowulf head node.

Initialize the environmental variables depending on the memory required for your jobs.
- 'mapping' and 'saet' analysis use more memory in general and can be run on 'g24' or 'g72' nodes.
- Most other jobs can be run on 'g4' or 'g8' nodes.
- There are a lot more g4, g8, and g24 nodes available than g72 nodes on the cluster right now.
Note: jobs that do not require 24 gb or 72 gb of memory but running on these nodes will be killed before finishing.

After choosing the [node] from (g4,g8,g24,g72), subsitute the choice into the following command lines, depending on your shell:

For csh/tcsh users:
biowulf> source /usr/local/bioscope-1.3.1/bioscope-[node]_profile.csh

For bash users:
biowulf> source /usr/local/bioscope-1.3.1/bioscope-[node]_profile.sh

verify your environment by doing this:
$ echo $BIOSCOPEROOT /usr/local/bioscope-1.3.1/bioscope-[node]

5. cd /data/user/run1/examples/demos, modify the 'global.ini' file under global/ and change the line to 'scratch.dir=/scratch'. The scratch.dir should be local to each compute node, not shared. So space under /data can not be used.

$ cd /data/user/run1/examples/demos

6. Start the program 'run.sh' under /data/user/run1/examples/demos which will run all the modules. Or you can run individual module by going into the module directory and start 'run.sh' program.

$ run.sh

7. Jobs will be submitted to Biowulf cluster by the run.sh program. User can check his/her jobs by running 'qstat -u userid' and 'jobload -mc YourUserid'.

$ qstat -u user
Documentation

http://solidsoftwaretools.com/gf/project/bioscope/docman/

Scientist's Guide