Biowulf at the NIH
RSS Feed
STAR on Helix & Biowulf

STAR aligns RNA-seq reads to a reference genome.

Its advantages include:

STAR was developed by Alex Dobin. STAR website

The STAR executable can be added to your path by typing 'module load STAR' or including it in a batch script.

Running Star on Helix

Sample session:

helix% module load STAR

helix% STAR --runMode genomeGenerate --genomeDir /fdb/genome/ \
--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --runThreadN 4 ...

helix %

Note that you should run a maximum of 4 threads on Helix.


Submitting a single batch job on Biowulf

1. Create a script file. Sample batch script file

# This file is starScript
#PBS -N star
#PBS -m be
#PBS -k oe

module load STAR 

cd /data/user/mydir 

STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir \
--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --runThreadN <n> …

2. Submit the script using the 'qsub' command on Biowulf, with, for example:

$ qsub -l nodes=1:g24:c16 ./script

This job will run on g24 (24 GB of memory) node. You may need to run a few test jobs to determine the amount of memory required then detemine the node type suitable for your job.

Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node.

Allocate an interactive node as described below, and run the interactive job there. Alternatively, run Star interactively on Helix.

biowulf% qsub -I -l nodes=1:g24:c16
qsub: waiting for job 2236960.biobos to start
      qsub: job 2236960.biobos ready

[user@pxxx]$ cd YourDir

[user@pxxx]$ module load STAR

[user@pxxx]$ STAR --runMode genomeGenerate --genomeDir /path/to/GenomeDir \
--genomeFastaFiles /path/to/genome/fasta1 /path/to/genome/fasta2 --runThreadN <n> …
[user@pxxx]$ exit
qsub: job 2236960.biobos completed

[user@biowulf ~]$

Note regarding multiple STAR processes using shared memory

STAR has the capacity to lock a genome reference file in memory, allowing later STAR processes to access this reference in memory without having to reload it. This can speed up jobs by up to 4x. To do so, first run a single STAR process with the LoadAndExit directive:

STAR --genomeDir /path/to/genome/reference/ --genomeLoad LoadAndExit

Then run the regular STAR processes with the LoadAndKeep directive:

STAR --genomeDir /path/to/genome/reference/ --genomeLoad LoadAndKeep ... other options ... ;
STAR --genomeDir /path/to/genome/reference/ --genomeLoad LoadAndKeep ... other options ... ;
STAR --genomeDir /path/to/genome/reference/ --genomeLoad LoadAndKeep ... other options ... ;
STAR --genomeDir /path/to/genome/reference/ --genomeLoad LoadAndKeep ... other options ...  

Lastly, run STAR with the Remove directive:

STAR --genomeDir /path/to/genome/reference --genomeLoad Remove

STAR manual (PDF)