This is a critical step in the design and development of new drugs to know which biological properties of the compound governs the interaction. According to the study, a benefit of utilising graph convolutional networks is their robustness to different orientations of the three‑dimensional (3D) structures of proteins, however a drawback to this is to find high-quality 3D protein structure.
In this study, the 3D protein structures were extracted from the protein data bank (PDB) which provides all the experimental methods such as nuclear magnetic resonance (NMR), X-ray diffraction and cryogenic-electron microscopy (cryoEM). The binding sites were extracted through a docking-based model which was previously studied. This method provides bounding box co‑ordination for each binding site of a protein. Next, they are used to convert the protein structure to a set of peptide fragments. Then the graph of protein is constructed by each atom acting as a node and the connections between atoms acting as edges. The feature vector of each atom, one‑hot encoding of atom type, atom degree, total number of hydrogen atoms and implicit valence of the atom are also reported in the form of a vector. The Simplified Molecular-Input Line-Entry system (SMILE) of the drug compounds were also represented in the form of graphs in a way that each atom in the small molecule is represented as a node of the graph and the connections between them are represented as edges. In addition, the graph’s atom features using one-hot encoding of atom type, atom degree, formal charge of the atom, number of radical electrons of the atom, the atom’s hybridisation, atom’s aromaticity and number of total hydrogens of the atom are also reported in the form of a vector.
One-dimensional representation is insufficient for complex interactions, particularly for proteins, which are much larger and more complex molecules than drugs. The improved performance of this model is due to the use of graph representations, which are an advanced feature representation and can significantly affect the model’s performance in capturing the structural information of molecules. According to this study, traditional machine learning and deep learning methods that use string representations cannot learn complex non‑linear relationships in drug target interaction. The self‑attention mechanism aids the AttentionSiteDTI model to extract the features automatically and to learn higher order non-linear relationships. The team used three benchmark datasets, DUD-E, Human and BindingDB, to compare the new model with state‑of-the-art graph-based models. AttentionSiteDTI performs comparably well against the state-of-the-art DTI prediction models when using a target protein that the prediction models are trained on. However, when the target protein is changed to another that the models have not been trained on, the performance of AttentionSiteDTI remains robust while the performance of the other models decreases significantly, which indicates a greater degree of generalisability achieved by the new model. This is important because it highlights the AttentionSiteDTI model can be used for a broad variety of protein targets with high performance.
This study is significant since it will assist other researchers to accelerate the drug design by identifying the binding sites’ functional properties. Drug designers can use AI and quickly act in response to new diseases and pandemics such as COVID-19, focusing on the most important binding sites of the virus’s protein. They are able to screen many variations of the protein and small molecules using AI to get accurate predictions of the binding before doing any laboratory experiments.
Furthermore, the team evaluated the binding between spike protein (along with ACE2 protein) of the SARS-CoV-2 virus and the seven candidate compounds (N-acetyl-neuraminic acid, 3α,6α‑Mannopentaose, N-glycolylneuraminic acid, 2-Keto3-deoxyoctonate, N-acetyllactosamine, cytidine5- monophospho-N-acetylneuraminic acid sodium salt and Darunavir) using a binding inhibition assay kit. The strength of the interaction was measured through laboratory experiments in the form of IC50 (half maximal inhibitory concentration) between the pair of drug and target. In this study, candidate molecules were used as inhibitors of the spike protein-ACE2 complex formation. The activity threshold was set at 15nM to identify the best compounds. This evaluation and comparison proved high agreement between computational prediction and experiment results. This shows the potential of the AttentionSiteDTI model in providing the drug designers with an effective tool to pre-screen small molecules in drug repurposing applications for the current pandemic, as drugs to treat COVID are still of interest and to be prepared for future possible pandemics.