Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- AI classifiers can accurately predict SARSCoV2 infection status
- 67,842 individuals with linked metadata were studied, 23,514 tested positive for SARS CoV 2
- Subjects were recruited via the UK governments National Health Service Test-and-Trace programme and the REACT randomised surveillance survey
- AI classifiers predict SARS-CoV-2 infection status with high accuracy
- After adjusting for confounders, classifier performance is weaker
- Audio based classifiers are outperformed by simple predictive scores based on user reported symptoms
Paper Content
Results
Study design
- Invited volunteers to participate in study from March 2021 to March 2022
- Collected audio recordings of four respiratory audio modalities
- Final dataset consisted of 23,514 COVID + and 44,328 SARS-CoV-2 PCR-negative (COVID − ) individuals
- Acoustic target should be causally linked to COVID-19
- Acoustic target should not be self-identifiable
- Acoustic target should enable high-utility COVID-19 screening
Characterising and controlling recruitment bias
- Audio-based COVID-19 classification results can be affected by the characteristics of the enrolled population.
Primary analyses
- Pre-specified analysis plan was designed and fixed to increase replicability of conclusions
- Audio-based COVID-19 prediction performance was presented in Table 1
- SSAST and BNN classifiers outperformed the baseline SVM
- Under the participant disjoint Randomised data split, SSAST classifier achieved high COVID-19 predictive accuracy of ROC-AUC=0.846
- When controlling for enrolment bias, predictive accuracy dropped to ROC-AUC=0.619
- Significant differences in predictive scores between COVID − and COVID + individuals in 28 strata were observed
Confirmatory analyses and validation
- Audio-based classifiers can be useful if they improve performance compared to classifiers based on self-identifiable symptoms.
- Test sets should reflect real-life settings.
- Balanced subsampling used to create “general population” test set.
- Benchmarking audio-based classifier against classifiers based on self-identifiable symptoms.
- Audio-based classifier offers small increase in predictive accuracy.
- Utility of classifier depends on context.
- Illustrative utility function specified.
- Exploratory methods used to identify influence of unmeasured confounders.
- Residual predictive variation persists after mapping to COVID - individuals.
- Points to unmeasured confounder bias contributing to classifier performance.
Discussion
- Collected largest PCR-validated dataset of its kind to date
- Quantified accuracy of audio-based classifiers for predicting COVID-19
- High accuracy before accounting for recruitment bias (ROC-AUC=0.85)
- Little residual predictive variation after controlling for recruitment bias (ROC-AUC=0.62)
- Self-reported symptoms outperform audio-based AI classifiers
- Recruitment bias can artificially inflate association between COVID-19 and its symptoms
- Audio-based classifiers can augment and complement self-screening
- Collect and disseminate metadata to filter data for quality and relevance
- Characterise and control recruitment bias
- Design studies with bias control in mind
- Focus on added predictive value of classifiers
- Assess classifier performance in targeted settings
- Examine classifier’s expected utility in applied setting
- Out-of-study replication
Methods
Dataset collection and characteristics
- Three papers accompany this work
- Main sources of recruitment were REACT study and NHS T+T system
- Enrolment was opt-in
- Participants asked to record four audio clips
- Demographic and clinical/health metadata collected
- Final dataset of 23,514 COVID + and 44,328 COVID − individuals
- Data split into three training sets and five test sets
Machine learning models
- Three models were implemented to detect COVID-19 from audio
- Models cover a range of machine learning research
- Baseline model uses 6,373 hand crafted features
- BNN model uses ResNet-50 and estimates uncertainty
- Features for BNN are Mel filterbank
- SSAST model uses transformers
Matching methodology.
- Constructed 1:1 Matched test set with 907 COVID+ and 907 COVID- participants
- Constructed Matched training set with 2,599 COVID+ and 2,599 COVID- participants
- Considered action of applying testing protocol to individual randomly selected from population and used combined utility of consequences of outcome
Data availability
- Dataset can be requested through UK Data Service
- Audio data provided in .wav format
- Metadata provided in .csv files
- Data is anonymised and safeguarded
- Authentication process required for access
Exploratory approaches to identify the influence of unmeasured confounders
- Matching can be used to control for bias in data.
- Unmeasured confounders can lead to inflated classification performance.
- Two exploratory approaches are introduced to quantify the effects of unmeasured confounding.
- Weak-Robust approach uses a low capacity model to identify cases with confounding signal.
- Calibration step evaluates the weak model on an easier task than COVID-19 detection.
- Project COVID+ and COVID- openSMILE features onto the first k principal components of the COVID- cases.
- Remove correctly classified individuals from Curated Matched test set.
- Evaluate SSAST model on Curated Matched test set.
- Graphical representation of enrolment effects.
- Predictive accuracy within Matched strata.
- Schematic demonstrating the importance of ascertainment bias adjustment.
- Symptomatic vs Asymptomatic for other COVID-19 datasets.
- Results of the Weak-Robust approach.
- SSAST performance when trained and evaluated on the COVID-19 sounds publicly available dataset.
- TSNE plots of the the final layer representations of the SSAST.
- Calibration plot for the SSAST for each respiratory modality.