A limited number of Schrödinger applications are available on the Biowulf cluster through the Molecular Modeling Interest Group. The use of Schrödinger applications is limited to intramural NIH users only.
Schrödinger applications can be run from the command line, through the batch system, and interactively through the Maestro GUI. See the documentation link below for more information, particularly the Job Control Guide.
There are several versions of Schrödinger available. As of December 2012,
Version Year Module 8.5.207 2008 schrodinger/8.5.207 9.0.109 2009 schrodinger/9.0.109 9.0.211 2009 schrodinger/9.0.211 9.1.107 2010 schrodinger/9.1.107 9.2.109 2011 schrodinger/9.2.109 2011.1 2011 schrodinger/2011.1 2012.2 2012 schrodinger/2012.2 2013.2 2012 schrodinger/2013.2
Please contact the Molecular Modeling Interest Group for current version status and any differences between versions.
The simplest way to set your environment is using environment modules. This line can be added to your ~/.bashrc or ~/.cshrc file to make the setting permanent.
module load schrodinger
Additionally, Schrödinger applications make use of two other environment variables. These handle where files are temporarily stored.
SCHRODINGER_TMPDIR is a scratch directory, and is set by default to ~/.schrodinger/tmp. SCHRODINGER_JOBDB2 is a directory containing information for the job database, and is set by default to ~/.schrodinger/.jobdb.
Because your /home directory is restricted to 8GB, it is strongly recommended to set these optional environment variables to your /data directory. Otherwise, you run the risk of running out of space during a run.
setenv SCHRODINGER_TMPDIR /data/[user]/schrodinger/tmp
setenv SCHRODINGER_JOBDB2 /data/[user]/schrodinger/jobdb
where [user] would be replaced by your username.
Job control is handled through the hosts file, schrodinger.hosts. When a Schrödinger application is started, it reads the schrodinger.hosts file to determine various attributes for job control.
Here are the keywords for schrodinger.hosts:
name The name of the entry. For a host this is usually the host name, but any name can be used. This name is displayed in Maestro by job control, and is used for the -HOST option. The name must not contain spaces. The value localhost is a special value that means the host on which the job is launched. host The host name. Required for batch queue entries; otherwise only needed if it is different from name. nodelist List of entry names, used to define a multiple-host entry. Each name may be followed by a colon and a number of processors. Can be combined with a host setting. schrodinger The path to the Schrödinger software installation on the host. You can include more than one of these settings. This value is superseded by the $SCHRODINGER environment variable. user The user name to use on the host. This should never be set in the hosts file in the installation directory. It is required if the user has a different user name on the defined host than on the host on which the job is launched. processors The number of processors available on the host. If the host is part of a cluster, this number should be the total number of processors available on the cluster. The default is 1. env Environment variables to be set on the specified host. The syntax for the environment variables is variable=value, regardless of the shell used. List each environment variable on a separate env line. Not used on the submission host. tmpdir Base directory for temporary or scratch files, also called the scratch directory. The actual directory created for scratch files (the job directory) is tmpdir/username/uniquename, where tmpdir is the directory defined here and username is the user name. Multiple tmpdir settings can be added for a given host and are used by Maestro, but the first setting is used otherwise. This value is superseded by the $SCHRODINGER_TMPDIR environment variable. queue Queuing system name. PBS is the supported system on Biowulf. Must be set to the subdirectory of $SCHRODINGER/queues that contains the support files for the queuing system. qargs Arguments to be used when submitting jobs to a batch queue. These arguments should specify any parameters that define the queue. include The name of an auxiliary hosts file to be included in the current hosts file. The inclusion is done by replacing the include line with the contents of the specified file. base The name of an entry (the base entry) that is the basis for the current entry. All the keywords from the base entry are inherited by the current entry, and new keywords may be added, in any order. May be used recursively.
Here is an example schrodinger.hosts file that can be used when running a Schrödinger application on a single node interactively (e.g., Glide). The node is defined by the name keyword, and the number of processors available on the node is defined by the processors keyword. Because the application does not require a shared scratch space, the tmpdir keyword is /scratch.
# name: p1835 tmpdir: /scratch processors: 4 # name: p1859 tmpdir: /scratch processors: 4 #
Here is another example schrodinger.hosts file that can be used when running a Schrödinger application on a multiple nodes via the PBS batch system (e.g., Desmond). The name keyword simply defines a name for the entry, and the host is the machine that is running the PBS queue manager. queue is defined as PBS, and qargs is set to the simplest qsub commandline options.
The tmpdir keyword is set to a directory that is shared by all nodes in the cluster, as the distributed application will require shared access. Finally, the processors keyword defines the total number of processors available to the user.
The 16CPU entry would be used if the application is only to be used on a single node using 16 processors. In this case, tmpdir can be /scratch. The 96CPU entry would be used if the application is to use 96 processors on 4 c24 nodes. qargs is set to use 4 c24 nodes (which have 24 processors each), and tmpdir is set to use user's data directory (user should be subsituted to your username prior to running).
# name: 16CPU host: localhost queue: PBS qargs: -l nodes=1:c16 tmpdir: /scratch processors: 16 # name: 96CPU host: localhost queue: PBS qargs: -l nodes=4:c24 tmpdir: /data/user processors: 96 #
When running on the Biowulf cluster, it is not always known ahead of time what nodes will be allocated, or how many processors are available on each node. The script /usr/local/bin/make_schrodinger_hostfile.sh will create the file schrodinger.hosts in the current working directory using current values from $PBS_NODEFILE. Previous schrodinger.hosts files will be preserved.
For more information about job control and the schrodinger.hosts file, please see the Job Control Guide.
[biowulf]$ cat bmintest.sh #!/bin/bash #PBS -N bmintest cd $PBS_O_WORKDIR module load schrodinger/9.3.5 bmin test2 -WAIT
qsub -l nodes=1 bmintest.sh
Maestro is a GUI for running Schrödinger jobs interactively. It requires an X-Windows connection.
Maestro should be run on an interactive node, not on the login node. Make sure you have an X-Windows connection to Biowulf. Start an interactive session, making sure to export all environment variables with the -V flag, then start maestro:
[user@biowulf ~]$ qsub -I -V -l nodes=1 qsub: waiting for job 99999999.biobos to start qsub: job 99999999.biobos ready [user@p2 ~]$
See http://biowulf.nih.gov/user_guide.html#interactive for more information on running interactive sessions on Biowulf.
Once an interactive node has been allocated, set the proper environment variables, and type $SCHRODINGER/maestro &.
IMPORTANT NOTE:Due to incompatibilities between the OpenGL/GLX commands given by the Maestro GUI and the X11 server running on your desktop client machine, the interface may hang or crash with errors. If this is the case, Maestro can be run with the -SGL option:
Glide runs can be submitted in parallel using interactive nodes and the schrodinger.hosts file:
- Set the proper environment variables.
- Allocate nodes interactively, rather than in
batch, making sure to include environment variables in the command line
[biowulf]$ qsub -V -I -l nodes=[num]:[prop],walltime=8:00:00
where num = number of nodes, prop = properties as usual. The wall time should be set to the number of hours:minutes:seconds the jobs is expected to run. Walltime will be set to 36 hours by default if no wall time is given, and at most 8 nodes may be allocated in a single interactive job.
- Once the job has started, create a brand new
This will find what nodes are allocated to the current job via the $PBS_NODEFILE variable, then create separate entries in the schrodinger.hosts file for each node. If the hosts file already exists, it creates a backup version instead of overwriting. The script determines the number of processors on each node and writes this to the hosts file.
To be very sure that everything is working correctly, you can run a test:
The hunt command attempts to send a command to and from each node in the schrodinger.hosts file. If this is successful, then start up maestro and launch your Glide job as parallel subjobs.
The above procedure can be used to launch serial processes onto the cluster.
Multiprocessor Desmond runs can be submitted through the batch system similar to Glide. Because Desmond jobs should be run exclusively on the cluster, they can be launched from the Biowulf login node.
- Set the proper environment variables. This is very important, as Desmond ignores the env,tmpdir and schrodinger keywords from the schrodinger.hosts file.
- Create a brand new schrodinger.hosts file:
- Start Maestro as usual:
[biowulf]$ maestro &
- Open the Run Desmond window, import files and configure the run, then click on the Start button. Choose the host that has the desired number of processors (the number in parentheses). Then click Start to submit the job to the batch system. IF YOU ARE SUBMITTING FROM THE BIOWULF LOGIN NODE, DO NOT USE localhost!
Desmond jobs will not scale well beyond 8 CPUs, so it not advisable to use CPUs/subjob values larger than 8. Below is a plot of clocktimes versus the number of cores used for a 12348 atom system molecular dynamics run. The simulation time was 1.2 ns, using NPT and at 300 K. As can be seen, the simulation does not scale well beyond 4 cores (efficiency < 50%), and the clocktime essentially plateaus with 8 or more cores.
For a more efficient molecular dynamics application, and a better explanation of efficiency, please see NAMD Benchmarks.
You can monitor the progress of the Desmond run from the monitor window: