To combat a virus, antibodies attach themselves to the virus and attempt to neutralise it; these antibodies are small proteins that are produced by immune systems. Scientists are attempting to create synthetic antibodies to combat the SARS-CoV-2 virus that causes COVID-19.
To create these antibodies successfully, researchers need to know how exactly the proteins are going to fuse with the virus’s spike protein. However, finding the right protein complex combination fit amongst the millions of combinations is an arduous task that would take software weeks, if not months.
This process may have become 80 to 500 times faster due to a machine-learning model created by MIT researchers that can predict the complex bind that can be formed by two proteins being bound together. the initial research also suggests that these predictions are a lot closer to the actual bonds that are formed.
“Deep learning is very good at capturing interactions between different proteins that are otherwise difficult for chemists or biologists to write experimentally. Some of these interactions are very complicated, and people haven’t found good ways to express them. This deep-learning model can learn these types of interactions from data,” says Octavian-Eugen Ganea, a postdoc in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-lead author of the paper.
The model, called Equidock, does rigid-body docking – the process of two proteins being ‘fitted’ by rotating or translating 3D space, without distorting or bending their shape.
Equidock converts the 3D structures of proteins into 3D graphs to be further proceeded by neural networks which represents a chain of the amino acid via a node in the graph. Further, the researchers have included geometric knowledge and mathematical knowledge into the model so it understands if the object changes its shape due to rotation or is being translated into 3D space, and so that the proteins almost always attach in the same way, no matter their location ina similar way to how proteins dock in human bodies.
“If we can understand from the proteins which individual parts are likely to be these binding pocket points, then that will capture all the information we need to place the two proteins together. Assuming we can find these two sets of points, then we can just find out how to rotate and translate the proteins so one set matches the other set,” Ganea explains.
According to the researchers, the lack of appropriate training datasets was a significant challenge since 3D data of proteins was so sparse to come by. Therefore, it was especially important to incorporate geometric knowledge into Equidock, Ganea says. Without those geometric constraints, the model might pick up false correlations in the dataset.
When the researchers compared the model to four other software methods, Equidock performed significantly faster; it took one to five seconds to predict an almost accurate final protein complex, while the others took anywhere between 10 minutes to an hour longer. However, sometimes it underperformed as compared to other baselines.
“We are still lagging behind one of the baselines. Our method can still be improved, and it can still be useful. It could be used in a very large virtual screening where we want to understand how thousands of proteins can interact and form complexes. Our method could be used to generate an initial set of candidates very fast, and then these could be fine-tuned with some of the more accurate, but slower, traditional methods,” he says.
To make it more accurate, the team wants to include specific atomic interactions into Equidock. In the future, they plan to enhance Equidock so it can make predictions for flexible protein docking. The biggest hurdle there is a lack of data for training, so Ganea and his colleagues are working to generate synthetic data they could use to improve the model.
This work was funded, in part, by the Machine Learning for Pharmaceutical Discovery and Synthesis Consortium, the Swiss National Science Foundation, the Abdul Latif Jameel Clinic for Machine Learning in Health, the DTRA Discovery of Medical Countermeasures Against New and Emerging (DOMANE) threats program, and the DARPA Accelerated Molecular Discovery program.
The research, which is cross-continental, will be presented at the International Conference on Learning Representations. Other researchers working on this project and co-authors of the paper are Xinyuan Huang, a graduate student at ETH Zurich, who is the co-lead; Regina Barzilay, the School of Engineering Distinguished Professor for AI and Health in CSAIL, and Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering in CSAIL and a member of the Institute for Data, Systems, and Society.
Source: indiaai.gov.in