Programs

Programs summary tables:

Table 19 Core rDock C++ executables

Executable

Used for

Description

rbcavity

Preparation

Cavity mapping and preparation of docking site (.as) file.

rbcalcgrid

Preparation

Calculation of vdW grid files (usually called by make_grid.csh wrapper script).

rbdock

Docking

The main rDock docking engine itself.

Table 20 Auxiliary rDock programs

Executable

Used for

Description

sdtether

Preparation

Prepares a ligand SD file for tethered scaffold docking. Annotates ligand SD file with tethered substructure atom indices. Requires Open Babel Python bindings.

rbhtfinder

Preparation

Used to optimise a high-throughput docking protocol from an initial exhaustive docking of a small representative ligand library. Parametrize a multi-step protocol for your system.

make_grid.csh

Preparation

Creates the vdW grid files required for grid-based docking protocols (dock_grid.prm and dock_solv_grid.prm). Simple front-end to rbcalcgrid.

rbconvgrid

Analysis

Converts rDock vdW grids to InsightII grid format for visualisation.

rbmoegrid

Analysis

Converts rDock vdW grids to MOE grid format for visualisation.

rblist

Analysis

Outputs miscellaneous information for ligand SD file records.

sdrmsd

Analysis

Calculation of ligand Root Mean Squared Displacement (RMSD) between reference and docked poses, taking into account ligand topological symmetry. Requires Open Babel Python bindings.

sdfilter

Analysis

Utility for filtering SD files by arbitrary data field expressions. Useful for simple post-docking filtering by score components.

sdsort

Analysis

Utility for sorting SD files by arbitrary data field. Useful for simple post-docking filtering by score components.

sdreport

Analysis

Utility for reporting SD file data field values. Output in tab-delimited or CSV format.

sdsplit

Utility

Splits an SD file into multiple smaller SD files of fixed number of records.

sdmodify

Utility

Sets the molecule title line of each SD record equal to a given SD data field.

Programs reference

rbdock

rbdock – The rDock docking engine itself.

$RBT_ROOT/bin/rbdock
{-i input ligand MDL SD file}
{-o output MDL SD file}
{-r system definition.prm file}
{-p docking protocol.prm file}
[-n number of docking runs/ligand]
[-s random seed]
[-T debug trace level]
[[-t SCORE.INTER threshold] | [-t filter definition file]]
[-ap -an -allH -cont]

Simple exhaustive docking

The minimum requirement for rbdock is to specify the input (-i) and output (-o) ligand SD file names, the system definition .prm file (-r) and the docking protocol .prm file (-p). This will perform one docking run per ligand record in the input SD file and output all docked ligand poses to the output SD file. Use -n to increase the number of docking runs per ligand record.

High-throughput docking, option 1

The -t and -cont options can be used to construct high-throughput protocols. If the argument following -t is numeric it is interpreted as a threshold value for SCORE.INTER, the total intermolecular score between ligand and receptor/solvent. In the absence of -cont, the threshold acts as an early termination filter, and the docking runs for each ligand will be terminated early once the threshold value has been exceeded. Note that the threshold is applied only at the end of each individual docking run, not during the runs themselves. If the -cont (continue) option is specified as well, the threshold acts as an output pose filter instead of a termination filter. The docking runs for each ligand run to completion as in the exhaustive case, but only the docking poses that exceed the threshold value of SCORE.INTER are written to the output SD file.

High throughput docking, option 2

Alternatively, if the argument following -t is non-numeric it is interpreted as a filter definition file. The filter definition file can be used to define multiple termination filters and multiple output pose filters in a generic way. Any docking score component can be used in the filter definitions. run_rbscreen.pl generates a filter definition file for multi-stage, high-throughput docking, with progressive score thresholds for early termination of poorly performing ligands. The use of filter definition files is preferred over the more limited SCORE.INTER filtering described above, whose use is now deprecated.

Automated ligand protonation/deprotonation

The -ap option activates the automated protonation of ligand positive ionisable centres, notably amines, guanidines, imidazoles, and amidines. The -an option activates the automated deprotonation of ligand negative ionisable centres, notably carboxylic acids, phosphates, phosphonates, sulphates, and sulphonates. The precise rules used by rDock for protonation and deprotonation are quite crude, and are not user-customisable. Therefore these flags are not recommended for detailed validation experiments, in which care should be taken that the ligand protonation states are set correctly in the input SD file. Note that rDock is not capable of converting ionised centres back to the neutral form; these are unidirectional transformations.

Control of ligand non-polar hydrogens

By default, rDock uses an implicit non-polar hydrogen model for receptor and ligand, and all of the scoring function validation has been performed on this basis. If the -allH option is not defined (recommended), all explicit non-polar hydrogens encountered in the ligand input SD file are removed, and only the polar hydrogens (bonded to O, N, or S) are retained. If the -allH option is defined (not recommended), no hydrogens are removed from the ligand. Note that rDock is not capable of adding explicit non-polar hydrogens, if none exist. In other words, the -allH option disables hydrogen removal, it does not activate hydrogen addition. You should always make sure that polar hydrogens are defined explicitly. If the ligand input SD file contains no explicit non-polar hydrogens, the -allH option has no effect. Receptor protonation is controlled by the system definition prm file.

rbcavity

rbcavity – Cavity mapping and preparation of docking site (.as) file.

$RBT_ROOT/bin/rbcavity
{-r system definition .prm file}
[-ras -was -d -v -s]
[-l distance from cavity]
[-b border]

Exploration of cavity mapping parameters

rbcavity -r .prm file

You can run rbcavity with just the -r argument when first preparing a new receptor for docking. This allows you to explore rapidly the impact of the cavity mapping parameters on the generated cavities, whilst avoiding the overhead of actually writing the docking site (.as) file to disk. The number of cavities and volume of each cavity are written to standard output.

Visualisation of cavities

rbcavity -r .prm file -d

If you have access to InsightII you can use the -d option to dump the cavity volumes in InsightII grid file format. There is no need to write the docking site (.as) file first. The InsightII grid files should be loaded into the reference coordinate space of the receptor and contoured at a contour level of 0.99.

Writing the docking site (.as) file

rbcavity -r .prm file -was

When you are happy the mapping parameters, use the -was option to write the docking site (.as) file to disk. The docking site file is a binary file that contains the cavity volumes in a compact format, and a pre-calculated cuboid grid extending over the cavities. The grid represents the distance from each point in space to the nearest cavity grid point, and is used by the cavity penalty scoring function. Calculating the distance grid can take a long time (whereas the cavity mapping itself is usually very fast), hence the -was option should be used sparingly.

Analysis of cavity atoms

rbcavity -r .prm file -ras -l distance

Use the -l options to list the receptor atoms within a given distance of any of the cavity volumes, for example to determine which receptor OH/NH3+ groups should be flexible. This option requires access to the pre-calculated distance grid embedded within the docking site (.as) file, and is best used in combination with the -ras option, which loads a previously generated docking site file. This avoids the time consuming step of generating the cavity distance grid again. If -l is used without -ras, the cavity distance grid will be calculated on-the-fly each time.

Miscellaneous options

The -s option writes out various statistics on the cavity and on the receptor atoms in the vicinity of the cavity. These values have been used in genetic programming model building for docking pose false positive removal. The -v option writes out the receptor coordinates in PSF/CRD format for use by the rDock Viewer (not documented here). Note that the PSF/CRD files are not suitable for simulation purposes, only for visualisation, as the atom types are not set correctly. The -b option controls the size of the cavity distance grid, and represents the border beyond the actual cavity volumes. It should not be necessary to vary this parameter (default = 8 Å) unless longer-range scoring functions are implemented.

rbcalcgrid

rbcalcgrid – Calculation of vdW grid files (usually called by make_grid.csh wrapper script).

$RBT_ROOT/bin/rbcalcgrid
{-r system definition file}
{-o output suffix for generated grids}
{-p vdW scoring function prm file}
[-g grid step]
[-b border]

Note that, unlike rbdock and rbcavity, spaces are not tolerated between the command-line options and their corresponding arguments. See $RBT_ROOT/bin/make_grid.csh for common usage.

make_grid.csh

Creates vdW grids for all receptor .prm files listed on command line. Front-end to rbcalcgrid.

rbconvgrid

rbmoegrid

rbmoegrid – Calculates grids for a given atom type.

rbmoegrid -o <OutputRoot> -r <ReceptorPrmFile> -p <SFPrmFile> [-g <GridStep> -b <border> -t <tripos_type>]

-o <OutFileName> (.grd is suffiexed)
-r <ReceptorPrmFile> - receptor param file (contains active site params)
-p <SFPrmFile> - scoring function param file (default calcgrid_vdw.prm)
-g <GridStep> - grid step (default = 0.5A)
-b <Border> - grid border around docking site (default = 1.0A)
-t <AtomType> - Tripos atom type (default is C.3)

sdrmsd

sdrmsd – calculation of ligand root mean squared displacement (RMSD) between reference and docked poses. It takes into account molecule topological symmetry. Requires Open Babel Python bindings.

$RBT_ROOT/bin/sdrmsd [options] {reference SD file} {input SD file}

With two arguments

sdrmsd calculates the RMSD between each record in the input SD file and the first record of the reference SD file. If there is a mismatch in the number of atoms, the record is skipped and the RMSD is not calculated. The RMSD is calculated over the heavy (non-hydrogen) atoms only. Results are output to standard output. If some record was skipped, a warning message will be printed to standard error.

With fitting

A molecular superposition will be done before calculation of the RMSD. The output will specify an RMSD FIT calculation was done.

sdrmsd -o output.sdf reference.sdf input.sdf
sdrmsd --out=output.sdf reference.sdf input.sdf

Output a SD file

This option will write an output SD file with the input molecules adding an extra RMSD field to the file. If fitting was done, the molecule coordinates will also be fitted to the reference.

sdrmsd -o output.sdf reference.sdf input.sdf
sdrmsd --out=output.sdf reference.sdf input.sdf

sdtether

sdtether – Prepares a ligand SD file for tethered scaffold docking. Requires Open Babel Python bindings. Annotates ligand SD file with tethered substructure atom indices.

$RBT_ROOT/bin/sdtether {ref. SDfile} {in SDfile} {out SDfile} "{SMARTS query}"

sdtether performs the following actions:

  • Runs the SMARTS query against the reference SD file to determine the tethered substructure atom indices and coordinates.

  • If more than one substructure match is retrieved (e.g. due to topological symmetry, or if the query is too simple) all substructure matchs are retained as the reference and all ligands will be tethered according to all possible matches.

  • Runs the SMARTS query against each record of the input ligand SD file in turn.

  • For each substructure match, the ligand coordinates are transformed such that the principal axes of the matching substructure coordinates are aligned with the reference substructure coordinates.

  • In addition, an SD data field is added to the ligand record which lists the atom indices of the substructure match, for later retrieval by rDock.

  • Each transformed ligand is written to the output SD file.

  • Note that if the SMARTS query returns more than one substructure match for a ligand, that ligand is written multiple times to the output file, once for each match, each of which will be docked independently with different tethering information.

sdfilter

sdfilter – Post-process an SD file by filtering the records according to data fields or attributes.

sdfilter -f '$<DataField> <Operator> <Value>' [-s <DataField>] [sdFiles]

or

sdfilter -f <filename> [-s <DataField>] [sdFiles]

Note

Multiple filters are allowed and are OR’d together. Filters can be provided in a file, one per line. Standard Perl operators should be used. e.g.

eq ne lt gt le ge # for strings
== != < > <= >=   # for numeric

_REC (record #) is provided as a pseudo-data field. If -s option is used, _COUNT (#occurrences of DataField) is provided as a pseudo-data field. If SD file list not given, reads from standard input. Output is to standard output.

For example, if results.sd constains multiple ligands each having multiple poses (ordered by score), then running

sdfilter -f'$_COUNT == 1' results.sd

will get you the first entry for each ligand.

sdreport

sdreport – Produces text summaries of SD records.

sdreport [-l] [-t [<FieldName, FieldName...>]] [-c <FieldName, FieldName...>] [-id <IDField>] [-nh] [-o] [-s] [-sup] [sdFiles]

-l (list format) output all data fields for each record as processed
-t (tab format) tabulate selected fields for each record as processed
-c (csv format) comma delimited output of selected fields for each record as processed
-s (summary format) output summary statistics for each unique value of ligand ID
-sup (supplier format) tabulate supplier details (from Catalyst)
-id <IDField> data field to use as ligand ID
-nh don't output column headings in -t and -c formats
-o use old (v3.00) score field names as default columns in -t and -c formats, else use v4.00 field names
-norm use normalised score filed names as default columns in -t and -c formats (normalised = score / #ligand heavy atoms)

Note

If -l, -t or -c are combined with -s, the listing/table is output withing each ligand summary. -sup should not be combined with other options. Default field names for -t and -c are RiboDock score field names. Default ID field name is Name. If sdFiles not given, reads from standard input. Output is to standard output.

sdsplit

sdsplit – Splits SD records into multiple files of equal size.

sdsplit [-<RecSize>] [-o <OutputRoot>] [sdFiles]

-<RecSize> record size to split into (default = 1000 records)
-o <OutputRoot> Root name for output files (default = tmp)

Note

If SD file list not given, reads from standard input.

sdsort

sdsort – Sorts SD records by given data field.

sdsort [-n] [-r] [-f <DataField>] [sdFiles]

-n numeric sort (default is text sort)
-r descending sort (default is ascending sort)
-f <DataField> specifies sort field
-s fast mode. Sorts the records for each named compound independently (must be consecutive)
-id <NameField> specifies compound name field (default = 1st title line)

Note

_REC (record #) is provided as a pseudo-data field. If SD file list not given, reads from standard input. Output is to standard output. Fast mode can be safely used for partial sorting of huge SD files of raw docking hits without running into memory problems.

sdmodify

sdmodify – Script to set the first title line equal to a given data field.

sdmodify -f <DataField> [sdFiles]

Note

If sdFiles not given, reads from standard input. Output is to standard output.

rbhtfinder

rbhtfinder – Script that simulates the result of a high throughput protocol.

1st) exhaustive docking of a small representative part of the
     whole library.
2nd) Store the result of sdreport -t over that exhaustive dock.
     in file  that will be the input of this
     script.
3rd) rbhtfinder <sdreport_file> <output_file> <thr1max> <thr1min> <ns1> <ns2>
     <ns1> and <ns2> are the number of steps in stage 1 and in
     stage 2. If not present, the default values are 5 and 15
     <thrmax> and <thrmin> setup the range of thresholds that will
     be simulated in stage 1. The threshold of stage 2 depends
     on the value of the threshold of stage 1.
     An input of -22 -24 will try protocols:
             5   -22     15      -27
             5   -22     15      -28
             5   -22     15      -29
             5   -23     15      -28
             5   -23     15      -29
             5   -23     15      -30
             5   -24     15      -29
             5   -24     15      -30
             5   -24     15      -31
     Output of the program is a 7 column values. First column
     represents the time. This is a percentage of the time it
     would take to do the docking in exhaustive mode, i.e.
     docking each ligand 100 times. Anything
     above 12 is too long.
     Second column is the first percentage. Percentage of
     ligands that pass the first stage.
     Third column is the second percentage. Percentage of
     ligands that pass the second stage.
     The four last columns represent the protocol.
     All the protocols tried are written at the end.
     The ones for which time is less than 12%, perc1 is
     less than 30% and perc2 is less than 5% but bigger than 1%
     will have a series of *** after, to indicate they are good choices
     WARNING! This is a simulation based in a small set.
     The numbers are an indication, not factual values.

An example file would look like as follows:

# 3 steps as the running filters (set by the "3" in next line)
3
if - -10 SCORE.INTER 1.0 if - SCORE.NRUNS 9 0.0 -1.0,
if - -20 SCORE.INTER 1.0 if - SCORE.NRUNS 14 0.0 -1.0,
if - SCORE.NRUNS 49 0.0 -1.0,
# 1 writing filter (defined by the "1" in next line)
1
- SCORE.INTER -10,

In other (more understandable) words.

First, rDock runs 3 consecutive steps:

  1. Run 10 runs and check if the SCORE.INTER is lower than -10, if it is the case:

  2. Then run 5 more runs (until 15 runs) to see if the SCORE.INTER reaches -20. If it is the case:

  3. Run up to 50 runs to freely sample the different conformations the molecule displays.

And, second:

For the printing information, only print out all those poses where SCORE.INTER is better than -10 (for avoiding excessive printing).

rblist

rblist – Output interaction center info for ligands in SD file (with optional autoionisation).

rblist -i <InputSDFile> [-o <OutputSDFile>] [-ap ] [-an] [-allH]

-i <InputSDFile> - input ligand SD file
-o <OutputSDFilde> - output SD file with descriptors (default = no output)
-ap - protonate all neutral amines, guanidines, imidazoles (default = disabled)
-an - deprotonate all carboxylic, sulphur and phosphorous acid groups (default = disabled)
-allH - read all hydrogens present (default = polar hydrogens only)
-tr - rotate all secondary amides to trans (default = leave alone)
-l - verbose listing of ligand atoms and rotable bonds (default = compact table format)