Benchmarking Foundation Models for Antibiotic Susceptibility Prediction

Helio Halperin1    Yanai Halperin1    Simon A. Lee2    Jeffrey N. Chiang2   
1Santa Monica High School 2 UCLA Computational Medicine 2024
VRB Initial Scene

Dall-E's attempt at visualizing the the research we performed. Makes our work look a lot cooler...

Abstract

The rise of antibiotic-resistant bacteria has been identified as a critical global healthcare crisis that compromises the efficacy of essential antibiotics. This crisis is largely driven by the inappropriate and excessive use of antibiotics, which leads to increased bacterial resistance. In response, clinical decision support systems integrated with electronic health records (EHRs) have emerged as a promising solution. These systems employ machine learning models to improve antibiotic stewardship by providing actionable, data-driven insights. This study therefore evaluates pre-trained language models for predicting antibiotic susceptibility, using several open-source models available on the Hugging Face platform. Despite the abundance of models and ongoing advancements in the field, a consensus on the most effective model for encoding clinical knowledge remains unclear.

Antibiotic Resistance

Antibiotic resistance is a growing global health threat, reducing the efficacy of treatments against bacterial infections. However with the ubiquitous amount of data, we are able to leverage Electronic Health Records (EHRs) which are patient histories to help predict antibitoic susceptibility. Therefore in this study we use pre-trained foundation models as a potential antibiotic stewardship effort to help combat antibiotic resistance and provide clinical decision support backed up by a data driven approach.

Transforming Electronic Health Records into text

In a previous work, we introduced a methodology for transforming Electronic Health Records (EHRs) into text, which we then used as input into pre-trained language model for predicting antibiotic susceptibility. We used the MIMIC-IV dataset, which contains de-identified health data and made a patient cohort that composed of patients presumed to have a STAPH infection. We detail our patient cohort in the Table Below:

Description Category train test totals
Prescription, n total 4803 1173 5976
Unique ID, n total 3283 878 4161
Age mean (SD) 59 (17) 58 (17)
Sex % female 1341 351 1692
male 1942 527 2469
Race/Ethnicity % White 2212 583 2795
Black 416 119 535
Other 401 96 497
Hispanic/Latino 150 55 205
Asian 88 20 108
Unable 12 3 15
Native Hawaiian 4 2 6

Modeling Setup

In this work we used the Hugging Face platform to access pre-trained language models. We used the following models:

Image description
In this study we decided to freeze these pre-trained model parameters and then use these embeddings as input into a light gradient boosting machine (LGBM) model. We frame each antibiotic susceptibility prediction task as independent binary classifications. We then used the Area under the Reciever Operating Characteristic and the Area under the Precision Recall Curve as our evaluation metrics.

Antibiotics & Their Prevalance

We studied eight antibiotics in this study. The table below shows the prevalence of each antibiotic in our dataset.

Category Antibiotic train test totals Prevalence (%)
Antibiotics Clindamycin 2645 624 3269 54.6838
Erythromycin 2626 639 3265 3.5141
Gentamicin 4549 1127 5676 54.6352
Levofloxacin 2866 715 3581 94.9799
Oxacillin 2702 667 3369 6.4759
Tetracycline 3747 909 4656 39.9598
Trimethoprim/sulfa 3671 908 4579 77.9116
Vancomycin 2529 611 3140 76.6232

Results

We begin by showing the results of our study. Figure 1 displays the Area under the Reciever Operating Characteristic and Figure 2 displays the Area under the Precision Recall Curve as our evaluation metrics. We rank our models from best (top) to worst (bottom).

Image description
Image description
Below we present the results of our study in a much cleaner format. Figure 3 displays the Area under the Reciever Operating Characteristic and Figure 4 displays the Area under the Precision Recall Curve as our evaluation metrics. We rank our models from best (top) to worst (bottom) to show who wins each antibiotic benchmark.
Image description
Image description
We see that the best model varies across different antibiotic objectives. However from these plots we do see that BiomedRoBERTa has 4 antibiotics where it performs the best.

We therefore also present the average ranks of each model where we see SciBERT having the highest average rank.
Image description

Discussion & Conclusion

From this study, we notice the varying winners across different antibiotics. This suggest that foundation models may be optimized for specific tasks in which they claim to be the new "State of the Art". Therefore we advise people who use foundation models as feature representation methods to perform benchmarks as a clear winner is not conclusive.

We therefore intend to perform a follow up study in which we fine-tune these foundation models on our dataset to see if we can improve the performance of these models and see if we can observe a clear cut winner who has a state of the art embedding.

Questions

If there are any questions or concerns, please feel free to reach out to us at heliohalperin@gmail.com or simonlee711@g.ucla.edu.