1. Introduction

The following is intended as documentation on how to submit jobs to the RaptorX-Binding server and retrieve results. For details on our structure prediction server, RaptorX, consult http://raptorx.uchicago.edu/documentation/

The manual is composed of the following sections:

Submitting a New Job
How to submit protein sequences for prediction of the binding site
Monitoring Job Status and Availability
How to obtain information on the status of your submitted jobs
Interpreting Results
An overview of a job result page with explanations

2. Submitting a New Job

Job submission is done by clicking "New Job" in the top menu of this page. This will display a form (depicted below) which the user can use to submit a protein sequence for binding site prediction. The numbers in superscript used in this section correspond to the labels in the figure.

/site_media/bindingmanual/new_job1.PNG

Currently, the server accepts only one sequence for prediction which can be provided either by inputing it in the textbox labled "Sequence"1 or by submitting a file with the sequence2.

After the "Submit" button was pressed, a unique JobID will be provided to the user. The user should make sure to save the JobID provided, as it is currently required to retrieve the results of the binding site prediction and monitor the progress of the job.

While the user is not required to enter an email address3, it is recommended they do so and click the "Notify me by email when my jobs are done" checkbox4. In this case the server will email the link to the job page after the job is done. This will ensure that the user can retrieve their job results even if JobID was lost. An optional Jobname5 can also be provided for the convenience of the user.

3. Monitoring Job Status and Availability

After submitting the sequence, the user will be redirected to the status page of the submitted job similar to the one shown below. As mentioned in Section 2 of this document, the user should take care to save the JobID provided on this page (labled number 1 in the figure). The status page of the job can also be accessed by clicking "Job Status" in the top menu of the server web page and submitting the JobID.

/site_media/bindingmanual/Status3.PNG

4. Interpreting Results

After the job is completed the user will be able to review results by clicking the sequence name. This will display a summary page similar to the one depicted below (the numbers in superscript used in this section correspond to the labels in the figure).

/site_media/bindingmanual/Results4.PNG

For binding site predictions, a protein structure is built from the top-ranked template chosen from alignment of the target sequence and the structures in the template library. In cases where multiple domains are found, a structure is built and the binding site predictions are made for each domain. The domain assignment of each residue can be seen at the top of the summary page under the input sequence string. The PDB code of the top-ranked template for the structure model is indicated under Prediction Results of each available domain in the top left corner1. Clicking the PDB code of the top-ranked template will take the user to the structure record at the Protein Data Bank http://www.pdb.org.

The P-value2, uGDT and GDT3 of the query alignment with the top ranked template are presented to be used as assessment of the quality of the resulting model structure. The uGDT is the unnormalized GDT (Global Distance Test) score defined as 1*N(1)+0.75*N(2)+0.5*N(4)+0.25*N(8), where N(x) is the number of residues with the local RMSD smaller than x. GDT is calculated as uGDT divided by the domain length and multiplied by a 100.

Pocket Multiplicity5 represents the frequency with which the selected pocket was found in the template structures. Figure C depicts the recall/precision of RaptorX-Binding using pocket multiplicity (Ntot in the figure) as a predictor of binding pockets, tested on the 251 binding site benchmark. As we can see from from the figure, when pocket multiplicity is above 40, there is a good chance that the predicted pocket is true.

For an interpretation of the alignment p-value we can consult the following two figures derived from CASP10 official domain threading results. Here, uGDT is calculated between the native structure and the model built from the top-ranked template. From Figure A it can be seen that for values of -log(p-value) greater than 4, 95% of the models have uGDT greater than 50. On the other hand, if for values of -log(p-value) less than 4, 98% of the models have uGDT less than 50. This indicates that if a model has uGDT greater than 50, it can be considered reasonably good. Figure B is an inset of Figure A illustrating the region with uGDT < 100 and -log(p-value) < 6.

Figure A:

/site_media/bindingmanual/uGDT-pvalue-casp10.bmp

Figure B:

/site_media/bindingmanual/uGDT-pvalue-casp10_inset_color.bmp

Figure C:

/site_media/bindingmanual/PockMultPrecisionRecall.jpg

By clicking "Pocket/Ligand" button6 the user can select a pocket and a ligand to visualize with the predicted structure in the Jmol viewer loaded underneath. The pockets are listed in order of their likelihood of being a binding site. Similarly, ligands are listed in order of their likelihood of binding the query protein. After the selection is made, the pocket rank4 and the ligand name13 are displayed above the Jmol viewer. The residues within the pocket are depicted using "ball-and-stick" representation for easy identification. Using the mouse the user can rotate and zoom on the structure. To the right of the structure viewer a menu for controlling the representation of the currently selected structures is available7. Right-clicking the model will bring up a menu of further options for changing the visualization.

The alignment of the target and template sequence used for constructing the 3D model is displayed below the Jmol viewer8. Each position in the alignment is color-coded according to the chemical nature of the residue. The scheme used is: Red=Hydrophobic, Blue=Acidic, Magenta=Basic, Green=Hydroxyl+Amine. Template insertions are omitted. The next row contains the predicted RMSD at each aligned position rounded to the nearest integer as indicators of reliability. In the last row, a '*' under a target residue signifies a predicted binding residue in the selected pocket. Hovering over aligned residues will highlight the target residue in the Jmol viewer.

The right-hand column provides information on the status of the prediction job9, a brief user's guide for the Jmol viewer10 and the sequence box11 as well as links for downloading the prediction results12. The results available for download include the PDB files of the constucted 3D models for each domain, the ligand PDB files rotated to their putative binding poses with the protein structure, grouped into sets according to their predicted pockets, as well as files containing the binding residues for each predicted pocket.

5. Example Result Pages

Please, click the following links to see working examples.