http://www.abbs.info e-mail:[email protected]

ISSN 0582-9879 ACTA BIOCHIMICA et BIOPHYSICA SINICA 2003, 35(1): 35-40 CN 31-1300/Q

A Soft Docking Algorithm for Predicting the Structures of Protein-protein Complexes

LI Chun-Hua, MA Xiao-Hui, CHEN Wei-Zu, WANG Cun-Xin*

( Center for Biomedical Engineering, Beijing Polytechnic University, Beijing 100022, China )

Abstract An efficient soft docking algorithm is described to predict the mode of binding between two proteins based on the three-dimensional structures of molecules. The molecular model used in this work was grounded on the “simplified protein” model used in Janin’s docking algorithm. The side chain flexibility of the amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface was considered through softening the molecular surface. A double filtering technique was used to eliminate most of the unlike binding modes. The energy minimization was performed on the retained structures, and then these structures were evaluated with the scoring function which included electrostatic, desolvation and van der Waals energy terms. The 26 complexes were used to test this docking algorithm and good results were obtained. The native-like conformations of all the complexes were all found, of which 20 were ranked in the top 10.

Key words protein-protein interactions; molecular recognition; molecular flexibility; binding free energy; soft docking

Protein-protein interactions play an important role in many physiological processes such as signal transduction, cell regulation and the immune response. Tremendous experimental and computational efforts^[1–5] are devoted to studying protein-protein association, with the goal of scientific and commercial breakthrough in drug discovery. Due to the difficulties in determining the structures of protein-protein complexes by X-ray crystallography or NMR spectroscopy, the docking method to predict protein-protein recognition has immense applications^[6].

In general, the docking algorithm can be divided into three stages: searching, filtering and scoring. It is a hard problem to perform an entire conformation search, even neglecting the crucial effect of solvent, owing to the large number of atoms and degrees of freedom involved in the system. Fortunately, during protein-protein association, the large conformational changes are frequently confined to the protein surface, especially for the flexible amino acid side chains^[7,8]. Currently, some techniques using a ‘soft’ representation of the molecular surface^[9–14] have been developed to tolerate a limited degree of molecular flexibility. Jiang et al.^[9] have used a cube representation of the molecular surface and volume in the soft docking procedure. Ritchie et al.^[10] have introduced a ‘soft’ model of electrostatic complementarity into the algorithm. Afterwards, Palma et al.^[11] have proposed a surface-implicit method to embody the “softness” of the molecular surface. In this paper, the surfaces of the flexible amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface are soften in the molecular model, which solves the unbound docking problem in some degree.

A search procedure may produce millions of binding modes. It is necessary to drastically reduce the number of solutions at the filtering stage. So far, in most of the docking algorithms, the geometric complementarity of protein surfaces has been used as the filtering criterion to select the potential solutions. It is, however, generally recognized that the single criterion of the geometric complementarity is not sufficient to discriminate between correct and incorrect docked structures except for a very few cases^[15]. In this article, besides the geometric complementarity, the residue pairing preferences^[16] at the protein-protein interface are taken into account at the filtering stage. The double filtering technique can retain much more native-like structures compared with that based solely on the geometric complementarity.

The development of a scoring function which can reliably distinguish correct docked structures from incorrect ones is a challenging topic of current research. Many scoring functions have been proposed based on geometric complementarity^[9,17] or electrostatic interaction^[14,15,18]. In this work, the combination of the molecular potential energy and the solvation energy is used as the scoring function to rank the putative docked structures.

1 Materials and Methods

1.1 The selected test systems

The 26 protein-protein complexes used to test the docking algorithm were selected from Protein Data Bank and listed in Table 1. These complexes included enzyme-inhibitor, antibody-antigen and the other complexes. In order to test the ability of the program to handle the conformational changes that occur upon complex formation, three kinds of docking patterns were performed. For the first 6 cases marked with XX, the complexes were reconstructed from the bound structures of both receptors and ligands. This set of docking simulations was designed as BOUND in Table 1. In the following 7 cases marked with FX or XF, the structures of the complexes were predicted based on the unbound/bound or bound/unbound conformations of the two interacting proteins (PSEUDO UNBOUND). For the last 13 cases marked with FF, the unbound structures of both proteins were used in our docking simulations (UNBOUND). Here, F and X denoted the unbound and bound structures, respectively.

Table 1 The 26 protein-protein complexes used to test the docking algorithm

Case^a	Description	Receptor	Number of residues	Ligand	Number of residues
BOUND
1CHOXX	α- Chymotrypsin/Ovomucoid	1cho^b	245	1cho^b	53
2SICXX	Subtilisin/Streptomyces inhibitor	2sic^b	275	2sic^b	107
1ACBXX	α- Chymotrypsin/Eglin C	1acb^b	245	1acb^b	63
2SNIXX	Subtilisin/Chymotrypsin inhibitor	2sni^b	275	2sni^b	64
2PTCXX	β- Trypsin/Pancreatic trypsin inhibitor	2ptc^b	223	2ptc^b	58
1TECXX	Thermitase/Eglin c	1tec^b	279	1tec^b	70
PSEUDO –UNBOUND
1UDIFX	Virus Uracil-DNA glycosylase/inhibitor	1udh	228	1udi^b	83
1JHLXF	IgG1 Fv Fragment/Lysozyme	1jhl^b	224	1ghl	129
1TABFX	Trypsin/BBI	3ptn	223	1tab^b	36
1BRCFX	Trypsin/APPI	1bra	223	1brc^b	56
1GLAXF	Glycerol kinase/GSF III	1gla^b	489	1f3g	150
1TGSFX	Trypsinogen/pancreatic trypsin inhibitor	1tgt	225	1tgs^b	56
3HFLXF	Fab HyHel-5 ( l,h-chains)/lysozyme	3hfl^b	427	1lza	129
UNBOUND
1BRCFF	Trypsin/APPI	1bra	223	1aap	56
1FSSFF	Acetylcholinesterase/Fasciculin II	2ace	523	1fsc	61
1FQ1FF	CDK2/KAP	1b39	290	1fpz	178
1CSEFF	Subtilisin Carlsberg/Eglin C	1scd	274	1acb	63
1MAHFF	Mouse Acetylcholinesterase/inhibitor	1maa	536	1fsc	61
1MLCFF	Fab D44.1 (a,b-chains)/lysozyme	1mlb	432	1lza	129
2PTCFF	β-Trypsin/pancreatic trypsin inhibitor	3ptn	223	4pti	58
1AHWFF	Antibody Fab 5G9/Tissue factor	1fgn	221	1boy	211
2KAIFF	Kallikrein A/pancreatic trypsin inhibitor	2pka	232	1bpi	57
1CHOFF	α- Chymotrypsin/Ovomucoid	5cha	237	2ovo	53
1MDAFF	Methylamine dehydrogenase/Amicyanin	2bbk	470	1aan	103
1BRBFF	Trypsin/pancreatic trypsin inhibitor	1bra	223	1bpi	51
1CGIFF	α- Chymotrypsinogen/pancreatic trypsin inhibitor	1chg	245	1hpt	56

^aThe PDB code of the complex.^bThe protein taken from the bound structure.

1.2 Treating molecular flexibility

Since amino acid residues Arg, Lys, Asp, Glu and Met at the protein surface present much larger flexibility than the other ones^[19], these residues were specially treated in our molecular model which was based on the “simplified protein” model with one sphere per residue^[1]. They were replaced with the spheres whose centers were at the Cβ atoms of their side chains and their radii were all equal to 0.15 nm (less than that of the Janin’s original molecular model). Thus the molecular surface was softened to some extent.

1.3 Searching

The six rigid-body docking parameters that defined the position and orientation of one molecule relative to the other were five Euler rotation angles (θ1, Φ1, θ2, Φ2 and χ) and an intermolecular distance ρ^[1]. θ1 and Φ1 located the center of the ligand relative to the receptor. θ2 and φ2 located the center of the receptor relative to the ligand. χ was a spin angle about the center line. The space of the five angles was systematically searched in steps of 10.0°. The search ranges of θ1 and φ1 were limited to +/- 20° around the active site and the search ranges of the other angles were: θ2 in +/-90°, φ2 and χ in +/-180°. Therefore, about 2.3×10⁵ different binding modes were generated for each complex.

1.4 Filtering

Before filtering, those conformations with the interface area less than 5 nm² were eliminated. Then a sub-population of 1000 binding modes was obtained from the remained structures by the double filtering technique.

In this work, we compared the effects of the double filter and the surface matching filter which was only based on the criterion of geometric complementarity scaled with the interface area. First, the top 1000 solutions were sorted by the descending interface area. Thus, a list containing 1000 solutions was founded. Then, for each of the rest, its value of the interface area was compared with that of the last one in the list. If its surface matching is worse than that of the last solution in the list, it will be discarded. Otherwise, it will be saved and inserted into the list by the interface area and the last one in the list will be eliminated.

The geometric surface complementarity may not be sufficient to reliably eliminate the unlike binding modes. In the tested cases, we have found that some native-like solutions have poor intermolecular surface contacts compared with some incorrect solutions. This can cause the native-like solution to be pushed down in the list and even excluded from the list of the retained solutions in some cases.

Therefore, the residue pairing preferences were introduced into the filtering criteria. A double filtering technique was implemented in our docking algorithm. In this procedure, the top 1000 solutions were still sorted by the descending interface area and each of the other solutions with a lower index of surface matching compared with the last element in the list was immediately discarded as above. However, the solutions with a higher index of surface matching were not automatically kept. They would be checked for the residue pairing preferences at protein-protein interfaces. Only those structures with more favorable residue pairing preferences as well as higher interface area were saved and inserted into the list by the interface area and the last element in the list was discarded.

After filtering, for the remained 1000 solutions, several binding modes with similar structure were replaced with an average conformation. This cluster analysis was similar to that used in Janin’s docking method^[8].

1.5 Scoring

After 1000 steps of energy minimization using GROMACS package^[20], all the retained structures were evaluated by the scoring function[Function (1)]:

Score=ΔE_elec+ΔG_des(ACE)+ΔE_vdw (1)

where ΔE_elec and ΔE_vdw denoted the changes in the electrostatic and van der Waals energies, respectively. They were calculated based on the GROMOS force field^[21]. ΔG_des(ACE) was the desolvation free energy based upon the atomic contact energy (ACE)^[22]. In the ACE model, the local interactions were given by , where e_ij denoted the atomic contact energy between atoms i and j and the sum was taken over all atom pairs that were less than 0.6 nm apart. In this work, in order to avoid the use of a sharp distance cutoff, we defined n_ij as a function [Function (2)] of the distance between the two atoms (r_ij):

(2)

Where r_on =0.6 nm and r_off=1 nm. According to this definition, a contact would be counted if the distance between the atoms i and j is less than 0.6 nm. When r_ij>0.6 nm, n_ijwould become a fraction and gradually fall to zero as r_ij approaches 1 nm. When r_ij>1 nm, the atomic contact energy would be zero. Therefore, the atomic contact energy could be written as function (3):

(3)

2 Results and Discussion

2.1 Treatment of molecular flexibility and double filtering technique

In order to examine the effect of the molecular flexibility treatment in our molecular model, we compared the two kinds of docked structures obtained with our modified molecular model and the Janin’s original model with the experimental structure. Fig.1 shows one of the results obtained from the comparisons above for the complex 1brc. The docking was performed starting from the superimposed structures of the enzyme trypsin (1bra) and its inhibitor APPI (1 aap) upon the complex 1brc, but far apart to 20 nm. Actually, in the association of the two molecules, an obvious conformational change occurs on the Arg15 side chain of the inhibitor APPI, which can be found by comparing the bound and unbound structures of the inhibitor APPI. Comparing the two docked structures [Fig.1(B) and (C)] using the modified and original molecular models with the experimental structure [Fig.1(A)], we can see that the docking using the modified molecular model tolerates the appropriate overlap between Arg15 side chain of APPI and Trp²¹⁵ of trypsin, but the major clash between them appears when the docking is performed with the Janin’s original molecular model. This means that our modified molecular model can allow the side chain flexibility of the surface residues reasonably.

Fig.1 The structures of the complex 1 brc

(A) The experimental structure. (B) The docked structure with the modified molecular model. (C) The docked structure with the Janin’s original molecular model.

Table 2 shows the results obtained from all the docking simulations. The columns A, B and C in Searching and Filtering list the numbers of the native-like structures retained after the searching and filtering stages. The numbers in the column A were obtained by docking with the original molecular model and filtering according to the criteria of the geometric matching. The column B presents the results of the docking simulations using the modified molecular model and the geometric matching filter. The column C gives the corresponding data for the docking simulations using the modified molecular model and the double filter. A docked conformation is taken as a native-like structure if its root mean square deviation (RMSD) of the backbone atoms (N, Ca, C, O) from the experimental structure is less than 0.4 nm.

Table 2 Docking results

	Searching and Filtering⁺			Scoring*
Case	A	B	C	Rank	RMSD (nm)
BOUND
1CHOXX	74	62	97	3	0.13
2SICXX	11	11	8	4	0.18
1ACBXX	25	18	38	1	0.35
2SNIXX	21	15	27	1	0.26
2PTCXX	14	5	23	1	0.27
1TECXX	24	12	35	2	0.31
PSEUDO–UNBOUND
1UDIFX	29	53	67	6	0.20
1JHLXF	-	5	21	22	0.38
1TABFX	81	70	83	1	0.27
1BRCFX	7	19	24	1	0.36
1GLAXF	2	9	29	3	0.34
1TGSFX	22	16	52	6	0.23
3HFLXF	-	2	9	20	0.35
UNBOUND
1BRCFF	-	4	14	1	0.39
1FSSFF	41	66	74	3	0.32
1FQ1FF	-	10	12	41	0.32
1CSEFF	37	40	51	2	0.38
1MAHFF	69	80	79	1	0.06
1MLCFF	-	-	6	104	0.37
2PTCFF	15	6	34	4	0.13
1AHWFF	-	5	16	18	0.21
2KAIFF	9	18	30	1	0.23
1CHOFF	31	40	49	1	0.10
1MDAFF	-	7	7	4	0.25
1BRBFF	15	20	29	6	0.06
1CGIFF	37	49	44	15	0.38

+, The numbers of native-like structures generated and retained according to different molecular models and filtering methods. *, The highest ranking position of a native-like structure and the corresponding RMSD (unit in nm) relative to the experimental structure. –, indicates no native-like structures were found.

Compared with the results of column A in Table 2, the data in column B indicate that obvious improvement is obtained for the pseudo-unbound and unbound docking and drawbacks occur in the bound docking when the docking is performed with the modified molecular model. Much more native-like solutions appear in column B compared with that in column A for most cases of the pseudo-unbound and the unbound docking. Moreover, some native-like solutions are found for 1JHLXF, 3HFLXF, 1BRCFF, 1FQ1FF, 1AHWFF and 1MDAFF with the modified molecular model. However, no native-like solutions are captured for these cases with the original molecular model. When the modified molecular model and the double filter are used in the docking simulations, the effect of the searching and filtering is improved for the three kinds of docking patterns (see column C).

For the bound docking, because the interfaces of the two molecules from which the docking started already fit well, the treatment of the molecular flexibility is unnecessary. This can explain why no improvement is obtained for the bound docking simulations through softening the molecular surface.

Although the softening of the molecular surface might be important to capture native-like solutions in the unbound docking simulations, it also has the effect of reducing the difference in geometric complementarity between correct and incorrect solutions. Therefore, the geometric matching filter may not be sufficient to reliably eliminate the incorrect solutions. It is necessary to improve the filtering criterion of the geometric complementarity. The residue pairing preferences are the statistical results based on a nonredundant database of 621 protein-protein interfaces and describe well the physicochemical and structural preferences at protein-protein interfaces. Therefore, the introduction of the residue pairing preferences into the filtering criteria makes much more native-like solutions obtained for the three kinds of docking patterns.

2.2 Scoring putative complexes

Table 2 also lists the ranking positions (in Scoring) of the first native-like structures for all the 26 complexes and the corresponding RMSD relative to the experimental structures. The first native-like structures of 20 out of 26 complexes are ranked within top 10. For 1BRCFF, although there is a major clash between the Arg15 side chain of APPI and the Trp²¹⁵ of trypsin, the native-like structure is still found and ranked first. Fig.2 shows the comparisons between the experimental structures of the complexes and the best-ranked native-like predictions of 1CSEFF and 2KAIFF reported in Table 2. It is clear that the binding sites are all satisfactorily identified. This indicates that the scoring function including electrostatic, desolvation and van der Waals energy terms is relatively successful in distinguishing correct binding modes from incorrect ones.

Fig.2 Superposition of the experimental structures of two protein-protein complexes and the best ranked native-like predictions reported in Table 2

(A) 1CSEFF; (B) 2KAIFF. The thick lines, C_a trace of the experimental structure; the thin line, C_a trace of the predicted structure.

3 Conclusions

It should be pointed out that the docking simulations in this paper are based on the assumption that the binding region on one of the two proteins is known. In the spherical polar coordinates used in this work, this information is given as a simple constraint in just one or two of the angular degrees of freedom. Execution time can be reduced to several minutes by applying these constraints before docking. Ritchie and Kemp have also used the same coordinates in their docking algorithm^[10] and successfully predicted the structures of some protein-protein complexes. In their test, when the search ranges of two angle degrees of freedom are limited to +/- 30° around the active site, the first native-like structures of 7 out of 18 complexes are ranked the top 10^[10]. In this paper, the first native-like conformations of 20 out of 26 tested complexes are ranked the top 10. This indicates that our algorithm captures some important factors in the protein-protein association and can provide useful help for the study of the molecular recognition.

In summary, our soft docking algorithm has some advantages: (1) the modified molecular model can improve the simulation result for the unbound protein-protein docking; (2) the double filtering technique can retain much more native-like structures and increase the successful probability of predicting the structures of protein-protein complexes; (3) the scoring function based on the binding free energy can effectively distinguish the correct structures from the incorrect ones. However, this method also has a few of shortcomings. For instance, the partial search of binding space is obviously of limitation for the docking simulations in which no information about the binding site is known. In addition, the desolvation free energy is not calculated accurately. The work for improving our docking algorithm is currently underway.

References

1 Cherfils J, Duquerroy S, Janin J. Protein-protein recognition analyzed by docking simulation. Proteins, 1991, 11: 271-280

2 Chothia C, Novotny J, Bruccoleri R, Karplus M. Domain association in immunoglobulin molecules: The packing of variable domains. J Mol Biol, 1985, 186: 651-663

3 Janin J, Chothia C. The structure of protein-protein recognition sites. J Biol Chem, 1990, 265(27): 16027-16030

4 Jones S, Thornton JM. Principles of protein-protein interactions. Proc Natl Acad Sci USA, 1996, 93(1): 13-20

5 Xie ZQ, Ding DF, Xu GJ. Delineation of continuous domain in proteins by differences of free energy. Acta Biochim Biophys Sin, 2001, 33(4): 386-394

6 Halperin I, Ma B, Wolfson H, Nussinov R. Principles of docking: An overview of search algorithms and a guide to scoring functions. Proteins, 2002, 47： 409-443

7 Lo Conte L, Chothia C, Janin J. The atomic structure of protein-protein recognition sites. J Mol Biol, 1999, 285: 2177-2198

8 Cherfils J, Janin J. Protein docking algorithms: Simulating molecular recognition. Curr Opin Struct Biol, 1993, 3: 265-269

9 Jiang F, Kim SH. “Soft docking “: Matching of molecular surface cubes. J Mol Biol, 1991, 219: 79-102

10 Ritchie DW, Kemp GJ. Protein docking using spherical polar Fourier correlations. Proteins, 2000, 39: 178-194

11 Palma PN, Krippahl L, Wampler JE, Moura JJ. Bigger: A new (soft) docking algorithm for predicting protein interactions. Proteins, 2000, 39: 372-384

12 Sandak B, Nussinov R, Wolfson HJ. An automated computer vision and robotics-based technique for 3-D flexible biomolecular docking and matching. Comput Appl Biosci, 1995, 11: 87-99

13 Vakser IA. Protein docking for low-resolution structures. Protein Eng, 1995, 8: 371-377

14Walls PH, Sternberg MJ. New algorithm to model protein-protein recognition based on surface complementarity. Applications to antibody-antigen docking. J Mol Biol, 1992, 228: 277-297

15 Shoichet BK, Kuntz ID. Protein docking and complementarity. J Mol Biol, 1991, 221: 327-346

16 Glaser F, Steinberg DM, Vakser IA, Ben Tal N. Residue frequencies and pairing preferences at protein-protein interfaces. Proteins, 2001, 43: 89-102

17 Lin SL, Nussinov R, Fischer D, wolfson HJ. Molecular surface representations by sparse critical points. Proteins, 1994, 18: 94-101

18 Gabb HA, Jackson RM, Sternberg MJ. Modelling protein docking using shape complementarity, electrostatics and biochemical information. J Mol Biol, 1997, 272: 106-120

19 Zhao S, Goodsell DS, Olson AJ. Analysis of a data set of paired uncomplexed protein structures: New metrics for side-chain flexibility and model evaluation. Proteins, 2001, 43: 271-279

20 van der Spoel D, van Buuren AR, Apol E, Meulenhoff PJ, Sijbers ALTM, Hess B, Feenstra KA et al. Biomolecular Simulation: The GROMACS User Manual. Groningen, Netherlands: Biomos, 1991

21 van Gunsteren WF, Billeter SR, Eising AA, Hünenberger PH, Krüger R, Mark AE, Scott WRP et al. Biomolecular Simulation: The GROMOS96 Manual and User Guide, Zürich, Switzerland: Hochschulverlag AG an der ETH, 1996

22 Zhang C, Vasmatzis G, Cornette JL, DeLisi C. Determination of atomic desolvation energies from the structures of crystallized proteins. J Mol Biol, 1997, 267(3): 707-726

Received: May 20, 2002 Accepted: September 9, 2002
This work was supported by grants from the Natural Science Foundation of China (No.29992590-2, 30170230 and 10174005) and Beijing Natural Science Foundation(No.5032002)
*Corresponding author: Tel, 86-10-67392724; Fax, 86-10-67391738; e-mail, [email protected]