Running a drug screening programme is a lot like throwing a huge cocktail party and listening in on the conversation. There is a lot of small talk at cocktail parties, but only a few real exchanges. Similarly, in drug screening programmes, low-affinity drug-target interactions far outweigh high-affinity binding.
Imagine having to listen to every word said during a cocktail party. That would undoubtedly be tedious. Consider how much more difficult it would be to examine every medication-target interaction in a standard drug screen. Even the most patient listener—the usual artificial intelligence (AI) system—would be exhausted.
Unfortunately, traditional AI algorithms require a long time to filter through data about drug candidates’ interactions with protein targets. Most AI algorithms calculate the three-dimensional structure of each target protein from its amino-acid sequence, then use those structures to forecast which pharmacological compounds it will interact with. The method is exhaustive but slow.
To speed things up, MIT and Tufts University researchers developed an alternate computational technique based on a form of AI algorithm known as a big language model. These models, such as ChatGPT, can analyse massive quantities of text to determine which words (or, in this case, amino acids) are most likely to appear together. ConPLex is the name given to the big language model created by the MIT/Tufts collaboration. It can link target proteins with possible therapeutic compounds without the computationally demanding step of calculating the structures of the chemicals.
ConPLex was included in the journal PNAS in an article titled “Contrastive learning in protein language space predicts interactions between drugs and protein targets.” ConPLex can outperform state-of-the-art techniques by leveraging advances in pretrained protein language models (“PLex”) and employing protein-anchored contrastive coembedding (“Con”).
“ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds,” the authors noted in their article. “It makes binding predictions based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome.”
The researchers next put their concept to the test by screening a library of over 4,700 potential drug compounds for their ability to bind to a group of 51 enzymes known as protein kinases.
The researchers chose 19 drug-protein pairings to investigate experimentally from the top hits. The investigations found that 12 of the 19 hits exhibited substantial binding affinity (in the nanomolar range), but nearly all of the other probable drug-protein pairings did not. Four of these pairings bound with exceptionally high, sub-nanomolar affinity (enough to block the protein with a small drug concentration on the order of parts per billion).
While this study focused mostly on screening small-molecule medications, the researchers are currently focusing on adapting this approach to other types of drugs, such as therapeutic antibodies. This type of modelling could also be beneficial for doing toxicity screens on possible therapeutic compounds before testing them in animal models to ensure they don’t have any undesired side effects.
“This work addresses the need for efficient and accurate in silico screening of potential drug candidates,” explained Bonnie Berger, PhD, an MIT researcher and one of the study’s senior authors. “[Our model] enables large-scale screens for assessing off-target effects, drug repurposing, and determining the impact of mutations on drug binding.”
“One of the reasons drug discovery is so expensive is that it has such a high failure rate,” said Rohit Singh, PhD, an MIT researcher and one of the study’s lead authors. “If we can reduce failure rates by stating upfront that this drug is unlikely to work, that could go a long way towards lowering drug discovery costs.”