According to a study including statisticians from the Department of Mathematics at King’s, Artificial Intelligence (AI) is unable to effectively identify whether someone has Covid-19 just on the sound of their cough.
A model that merely employed symptoms reported by individuals themselves combined with their demographic data, such as age and gender, did not outperform a system that used Machine Learning (ML) algorithms to detect Covid-19 infections, according to research that examined the accuracy of ML algorithms.
In order to ascertain whether AI classifiers could be used as a potential replacement for lateral flow tests, which could be less expensive, less environmentally wasteful, and more accurate, researchers undertook an independent review of how ML algorithms performed as a Covid-19 screening tool. This was commissioned by the UK Health Security Agency as part of the government’s pandemic response.
The project was overseen by the Alan Turing Institute and Royal Statistical Society and involved a group of researchers from the Universities of Oxford, Imperial College London, and University College London, as well as Professor Steven Gilmour, Dr. Davide Pigoli, Dr. Vasiliki Koutra, and Kieran Baker from the Department of Mathematics at King’s.
The team gathered and analyzed a dataset of audio recordings from 67,842 people who had also taken a PCR test and had been recruited in part through the NHS Test-and-Trace program and the REACT-1 study. Participants were instructed to record their breathing, talking, and coughing. Around 23,000 of them tested positive for Covid-19, according to the results of their PCR tests.
The findings of people’s Covid-19 test were then compared with the audio recordings that researchers used to develop a machine learning model. The results of earlier studies, including research by Massachusetts Institute of Technology, which reported up to 98.5 percent accuracy from AI classifiers predicting whether someone had Covid-19 based on audio recordings, were consistent with the initial findings of the current study. At first, in an unadjusted analysis of the data, the AI classifiers appeared to predict Covid-19 infection with high accuracy.
But, after further research, the findings from this most recent study showed a different picture.
“The AI models failed to perform well in terms of accuracy when we grouped participants of the same ages, genders, and symptoms into pairs, with only one of them having Covid-19, and evaluated these models on the matched data,” said Kieran Baker, PhD student at King’s College London and research assistant at the Alan Turing Institute.
The accuracy seems to be a result of confounding, a statistical effect. The model is learning that if you have symptoms in the audio, this is a proxy for Covid-19 infection, and no respiratory symptoms suggests Covid-free. Nearly all of the individuals in our sample who have Covid-19 have some symptoms. As a result, it overdiagnosed Covid-19 instances. Because Test and Trace only hired those with symptoms, the sample wasn’t representative of the general population, which leads to the confounding, according to Kieran Baker.
Our results show that audio-based AI classifiers were not able to outperform straightforward predicting ratings based on user reports of symptoms in real-world contexts.
Despite the fact that the study did not produce a novel treatment for Covid-19 or other diseases that require screening and diagnosis, the researchers were able to introduce new techniques for characterizing complex, high-dimensional bias and offer best-practice suggestions for handling recruitment bias. The results also provided new information for evaluating the usefulness of audio-based classifiers in pertinent real-world contexts.
Head of the Mathematics Department Professor Stephen Gilmour stated:
This study is current in that it emphasizes the need for prudence while developing machine learning evaluation techniques that are intended to produce meaningful performance indicators. Many applications in AI—where biases are frequently difficult to detect and difficult to correct for—can benefit from the crucial insights learned from this case study on the impacts of confounding. ” \s– Stephen Gilmour Kieran Baker, a professor, said:
Future systems that employ AI audio classifiers like this one are yet conceivable. Recent studies demonstrate improvements in the ability to identify chronic obstructive pulmonary disease (COPD) and sleep apnea using audio recordings.
But, “it is crucial that these models, together with the data, go through thorough model development and testing in order to be certain that this is really working as we anticipate.”