Biowulf at the NIH
RSS Feed
RandFold on Biowulf

randfold_sm

   A randomization test for sequence secondary structure.


This is RandFold version 2.  The software computes the probability that, for a given RNA sequence, the Minimum Free Energy (MFE) of the secondary structure is different from a distribution of MFE computed with random sequences.

RandFold was developed by Eric Bonnet at the Bioinformatics & Evolutionary Genomics group, Universiteit Gent in Belgium.  A web page referencing the research may be found at bioinformatics.psb.ugent.be.

RandFold is not a parallel program.  Small numbers of Randfold jobs, or interactive Randfold runs, can be run on Helix or on the Biowulf interactive nodes.  If you have many Randfold jobs to run, the swarm utility is recommended.

Before running randfold, the randfold module has to be loaded as in the example below. If you plan to use Randfold regularly, the command 'module load randfold' can be added to your .bashrc or .cshrc file instead of typing it in a batch script or interactive session.

RandFold Options

Syntax:
module load randfold
randfold <method> <file name> <number of randomizations>
Methods available:
-s  simple mononucleotide shuffling
-d  dinucleotide shuffling
-m  markov chain 1 shuffling

Output:
<sequence name> tab <mfe> tab <probability>
Example:
cel-let-7       -42.90  0.001000
Running an interactive Randfold job

Please note that only very short jobs (< 1 min) should be run on the Biowulf head node.   For interactive jobs running longer than one minute, please request an interactive batch node and run it there.

[user@biowulf ~]$ qsub -I -l nodes=1
qsub: waiting for job 2948120.biobos to start
qsub: job 2948120.biobos ready

[user@p3 ~]$ module load randfold
[user@p3 ~]$ cd mydir
[user@p3 ~/mydir]$ /usr/local/randfold-2.0/bin/randfold -d let7.tfa 999
cel-let-7       -42.90  0.001000
[user@p3 ~/mydir]$ 
[user@p3 ~/mydir]$ 
[user@p3 ~/mydir]$ exit
logout

qsub: job 2948120.biobos completed
[user@biowulf ~] 

Submitting a single RandFold batch job

Single RandFold jobs would typically be submitted only for debugging purposes.

1. Create a script file which contains the RandFold commands as below:

--------- /data/username/randfold/testrun1/script ------------
#!/bin/bash -v
#PBS -N randfoldJobName
#PBS -m be
#PBS -k oe

module load randfold

cd /data/username/randfold/testrun1
/usr/local/randfold-2.0/bin/randfold -d let7.tfa 999 > ./let7.out
--------------------- end of script --------------------------

2. Now submit the script using the 'qsub' command, e.g.

[user@biowulf ~] qsub -l nodes=1 /data/username/randfold/testrun1/script
Submitting a swarm of RandFold jobs

The swarm program is a convenient way to submit large numbers of jobs all at once instead of manually submitting them one by one. In this case, it's probably best to add 'module load randfold' to your .bashrc or .cshrc file.

1. First create different directories for each RandFold run. Put the required input files under the created directories.

[user@biowulf ~] mkdir /data/username/randfold

2. For each directory, create a script file (named rfX.script in this example) which contains the RandFold command as below. Make sure this file is executable:

----------- /data/username/randfold/rf1.script -----------
cd /data/username/randfold
/usr/local/randfold-2.0/bin/randfold -d let7.tfa 999 > ./let7.out
----------------------------------------------------------

----------- /data/username/randfold/rf2.script -----------
cd /data/username/randfold
/usr/local/randfold-2.0/bin/randfold -d let8.tfa 999 > ./let8.out
----------------------------------------------------------

3. Now prepare the swarm command file (named cmdfile below), e.g.

-------------- cmdfile ----------------
/data/username/randfold/rf1.script
/data/username/randfold/rf2.script
.....
....
/data/username/randfold/rfX.script
----------- end of cmdfile ------------

4. If each Randfold process requires less than 1 GB of memory, submit this to the batch system with the command:

swarm -f cmdfile

If each Randfold process requires more than 1 GB of memory, use

swarm -g # -f cmdfile
where '#' is the number of Gigabytes of memory required by each Randfold process.

See the swarm documentation for details.

More information

For more information, see the paper published in Bioinformatics:

Bonnet E., Wuyts J., Rouze P., Van de Peer Y.
Evidence that microRNA precursors, unlike other non-coding RNAs, have lower folding free energies than random sequences.
Bioinformatics. 2004 Nov 22;20(17):2911-7.
PMID: 15217813

A large collection of protein sequence databases is in /fdb/fastadb/.
Fasta-format databases and update status.