It’s been a year when the world first caught a glimpse of the novel coronavirus, a disease that was to bring an era of devastation, despoliation, disruption, and disorder. It was then when the sapiens first witnessed a virus that was soon going to plunder their health, ransack their world, desecrate their sanctity of life, and bring the entire planet to a standstill.
Today, even after 15 months into the pandemic, there are a lot of unknowns about it. The mysterious virus has been changing the game every few weeks and altering its shape inside every victim. From being asymptomatic to showing life-threatening symptoms and from curable cough to acute respiratory distress syndrome, the virus has thrown up varying complexities.
And this diversity of SARS-CoV-2 has highlighted the urgent need to identify which cases are mild and which will escalate to critical illness.
Colloquially, progress is being made in this regard with the aid of technology, particularly Artificial Intelligence (AI). Artificial Intelligence and Machine Learning are actively being used to deploy strategies of identifying high-risk patients at an earlier stage and help curb the rate of mortality.
COVIDOUTCOME is the result of one such attempt of AI to combat the pandemic. It is an online prediction platform enabled to foretell a percentage estimation of the COVID severity by using a mutation signature and the patient’s age as inputs. The portal is an upshot of a study conducted by US-based Cold Spring Harbor Laboratory to statistically link the mutations in the genome of SARS-CoV-2 to critical disease outcomes.
The study employed an automated machine learning approach whereby 1594 viral genomes, half severe and half mild, were used as inputs along with their available clinical follow-up data. The SARS-CoV-2 nucleic acid sequences were collected from the GISAD Virus repository and the mutations extracted from CoVsurver analysis tool. The Wuhan strain (hCoV-19/Wuhan/WIV04/2019) was the reference of the study.
Explaining its methodology, the study said, “The UTR mutations were extracted from the multiple alignments of underlying sequences by comparing the target sequence to the reference sequence. The multiple alignments were constructed using the MAFF software tool, and substitutions occurring in at least ten genomes were selected for further analysis. The protein mutations were exported in protein alteration format, non-protein (i.e. UTR) mutations were exported in nucleotide mutation format”.
A combination of machine learning classification and feature selection algorithm was applied to the group of genomes to demonstrate that mutation signatures carry sufficient information to separate mild outcomes from severe ones. This was apparently a two-step procedure that started with the classification of genomic mutation data based on logistic regression and support vector machines in appropriate conjunction with LASSO feature selection.
The second step involved outputting of classification efficiency measures and a list of mutations ranked in the order of their importance in segregating severe outcomes from the rest. Performance estimation of the final model was done by repeated, stratified, tenfold cross-validation (CV) and adjusted for multiple testing with Bootstrap Bias Corrected CV.
In its findings, the study identified 26 protein and UTR mutations associate with critical outcomes. The study notes, “The best classification algorithm uses a mutation signature of 22 mutations as well as patients’ ages as the input and shows high classification efficiency with an AUC of 0.94 and prediction accuracy of 87%. Finally, we established an online platform which is capable to use a viral sequence and the patient’s age as the input and provides a percentage estimation of disease severity.”
An online prediction platform – COVIDOUTCOME – was set up to furnish a probabilistic count of infection severity along with a qualitative index of the strength of the diagnosis. Its server takes up a genomic sequence in FASTA format and outputs:
- A list of proteins and UTR mutations, and
- Probability of the genome in causing severe infection
The output is presented in graphical as well as numerical form.
Yet to be peer-reviewed, the study has put forth a well-enabled tool to provide real-time analysis of mutated, new viral genomes.
It is a huge advancement as it poses the question of whether the mutation signatures of the SARS-CoV-2 virus can be considered an indicator of the infection. The crucial thing here will be to see how much confidence the study can garner from scientific experts and much refinement and evaluation it has to sustain before it is rolled out for clinical application.
Source: indiaai.gov.in