Antibiotic resistance is a growing global health threat, reducing the efficacy of treatments against bacterial infections. However with the ubiquitous amount of data, we are able to leverage Electronic Health Records (EHRs) which are patient histories to help predict antibitoic susceptibility. Therefore in this study we use pre-trained foundation models as a potential antibiotic stewardship effort to help combat antibiotic resistance and provide clinical decision support backed up by a data driven approach.
In a previous work, we introduced a methodology for transforming Electronic Health Records (EHRs) into text, which we then used as input into pre-trained language model for predicting antibiotic susceptibility. We used the MIMIC-IV dataset, which contains de-identified health data and made a patient cohort that composed of patients presumed to have a STAPH infection. We detail our patient cohort in the Table Below:
Description | Category | train | test | totals |
---|---|---|---|---|
Prescription, n | total | 4803 | 1173 | 5976 |
Unique ID, n | total | 3283 | 878 | 4161 |
Age mean (SD) | 59 (17) | 58 (17) | ||
Sex % | female | 1341 | 351 | 1692 |
male | 1942 | 527 | 2469 | |
Race/Ethnicity % | White | 2212 | 583 | 2795 |
Black | 416 | 119 | 535 | |
Other | 401 | 96 | 497 | |
Hispanic/Latino | 150 | 55 | 205 | |
Asian | 88 | 20 | 108 | |
Unable | 12 | 3 | 15 | |
Native Hawaiian | 4 | 2 | 6 |
In this work we used the Hugging Face platform to access pre-trained language models. We used the following models:
We studied eight antibiotics in this study. The table below shows the prevalence of each antibiotic in our dataset.
Category | Antibiotic | train | test | totals | Prevalence (%) |
---|---|---|---|---|---|
Antibiotics | Clindamycin | 2645 | 624 | 3269 | 54.6838 |
Erythromycin | 2626 | 639 | 3265 | 3.5141 | |
Gentamicin | 4549 | 1127 | 5676 | 54.6352 | |
Levofloxacin | 2866 | 715 | 3581 | 94.9799 | |
Oxacillin | 2702 | 667 | 3369 | 6.4759 | |
Tetracycline | 3747 | 909 | 4656 | 39.9598 | |
Trimethoprim/sulfa | 3671 | 908 | 4579 | 77.9116 | |
Vancomycin | 2529 | 611 | 3140 | 76.6232 |
We begin by showing the results of our study. Figure 1 displays the Area under the Reciever Operating Characteristic and Figure 2 displays the Area under the Precision Recall Curve as our evaluation metrics. We rank our models from best (top) to worst (bottom).
From this study, we notice the varying winners across different antibiotics. This suggest that foundation models may be optimized for specific tasks in which they claim to be the new "State of the Art". Therefore we advise
people who use foundation models as feature representation methods to perform benchmarks as a clear winner is not conclusive.
We therefore intend to perform a follow up study in which we fine-tune these foundation models on our dataset to see if we can improve the performance of these models and see if we can observe a clear cut winner who has a state of the art embedding.
If there are any questions or concerns, please feel free to reach out to us at heliohalperin@gmail.com or simonlee711@g.ucla.edu.