Software that reliably forecasts chemical modifications of RNA molecules based on genomic information was developed by a team of researchers from the Agency for Science, Technology, and Research and the National University of Singapore.
They reported their technique in Nature Methods under the name m6Anet.
The various chemical compounds that are incorporated into the RNA determine how the RNA molecule behaves. However, the conventional methods that scientists use to read RNA frequently fail to detect these RNA modifications. There are already more than 160 known RNA modifications, the most common of which, N6-Methyladenosine (m6A), is linked to human illnesses like cancer.
Until recently, finding RNA changes required lengthy, difficult bench experiments that were out of reach for most scientists. Additionally, m6A could not be found using earlier techniques at the single-molecule level, which is essential for comprehending the biological processes that m6A is involved in.
By using direct Nanopore RNA sequencing, a new technology that sequences a raw RNA molecule along with its RNA modifications, the team was able to get beyond these restrictions. In this study, they created m6Anet, a piece of software that uses multiple-instance learning (MIL) and copious direct Nanopore RNA sequencing data to train deep neural networks to precisely detect the presence of m6A.
“In traditional machine learning, each example we want to categorise typically has one label. For instance, the algorithm learns to distinguish cat images from other images based on their labels when each image is either a cat or it is not. There is a problem with recognising m6A since there is an excessive volume of data with ambiguous labels. Picture yourself trying to find a certain photo in a huge photo album filled with millions of other photos without any labels to guide your search. The MIL problem has, fortunately, been investigated in the machine learning literature before, according to Christopher Hendra, a PhD candidate at A*STAR’s Genome Institute of Singapore (GIS) and NUS Institute of Data Science and the study’s first author.
In this study, the team showed that m6Anet can accurately predict the presence of m6A from a single sample across species at the single-molecule level.
Dr. Jonathan Goke, Group Leader of the Laboratory of Computational Transcriptomics at A*STAR’s GIS and senior author of the study, said, “Our AI model has only seen data from a human sample, but it is able to accurately identify RNA modifications even in samples from species that the model has not seen before.” “RNA alterations’ roles in numerous diverse applications, such as cancer research or plant genomics, can be understood by being able to recognise them in various biological samples.”
It is quite gratifying to see how theoretically sound and well researched machine learning techniques, like the MIL, can be used to provide a beautiful solution to this difficult issue. The fact that the software is being used by the scientific community so quickly is a reward for our work, said co-study leader Associate Professor Alexandre Thiery of the Department of Statistics and Data Science at the NUS Faculty of Science.
According to Prof. Patrick Tan, Executive Director of A*STAR’s GIS, “m6Anet helps to solve these constraints. Accurately and effectively identifying RNA alterations had been a long-standing difficulty. This AI technique and the study’s findings have been made public for the benefit of the larger scientific community so that other researchers can go forward with their work more quickly.