Title:The Impact of Crystallographic Data for the Development of Machine Learning Models to Predict Protein-Ligand Binding Affinity
Volume: 28
Issue: 34
Author(s): Martina Veit-Acosta*Walter Filgueira de Azevedo Junior*
Affiliation:
- Western Michigan University, 1903 Western, Michigan Ave, Kalamazoo, MI49008,United States
- Pontifical Catholic University of Rio Grande do Sul (PUCRS); Av. Ipiranga, 6681 Porto Alegre/RS 90619-900,Brazil
Keywords:
Crystal structures, machine learning, scoring function space, binding affinity, SAnDReS, Taba.
Abstract:
Background: One of the main challenges in the early stages of drug discovery
is the computational assessment of protein-ligand binding affinity. Machine learning techniques
can contribute to predicting this type of interaction. We may apply these techniques
following two approaches. Firstly, using the experimental structures for which
affinity data is available. Secondly, using protein-ligand docking simulations.
Objective: In this review, we describe recently published machine learning models based
on crystal structures, for which binding affinity and thermodynamic data are available.
Method: We used experimental structures available at the protein data bank and binding
affinity and thermodynamic data was accessed through BindingDB, Binding MOAD, and
PDBbind databases. We reviewed machine learning models to predict binding created using
open source programs, such as SAnDReS and Taba.
Results: Analysis of machine learning models trained against datasets, composed of crystal
structure complexes indicated the high predictive performance of these models when
compared with classical scoring functions.
Conclusion: The rapid increase in the number of crystal structures of protein-ligand complexes
created a favorable scenario for developing machine learning models to predict
binding affinity. These models rely on experimental data from two sources, the structural
and the affinity data. The combination of experimental data generates computational models
that outperform the classical scoring functions.