Resumen
Background: Sepsis is one of the major causes of in-hospital death, and is frequent in patients presenting to the emergency department (ED). Early identification of high-risk septic patients is critical. Machine learning (ML) techniques have been proposed for identification and prognostication of ED septic patients, but these models often lack pre-hospital data and lack validation against early sepsis identification scores (such as qSOFA) and scores for critically ill patients (SOFA, APACHE II). Methods We conducted an electronic health record (EHR) study to test whether interpretable and scalable ML models predict mortality in septic ED patients and compared their performance with clinical scores. Consecutive adult septic patients admitted to ED over 18 months were included. We built ML models, ranging from a simple-classifier model, to unbalanced and balanced logistic regression, and random forest, and compared their performance to qSOFA, SOFA, and APACHE II scores. Results: We included 425 sepsis patients after screening 38,500 EHR for sepsis criteria. Overall mortality was 15.2% and peaked in patients coming from retirement homes (38%). Random forest, like balanced (0.811) and unbalanced logistic regression (0.863), identified patients at risk of mortality (0.813). All ML models outperformed qSOFA, APACHE II, and SOFA scores. Age, mean arterial pressure, and serum sodium were major mortality predictors. Conclusions: We confirmed that random forest models outperform previous models, including qSOFA, SOFA, and APACHE II, in identifying septic patients at higher mortality risk, while maintaining good interpretability. Machine learning models may gain further adoption in the future with increasing diffusion and granularity of EHR data, yielding the advantage of increased scalability compared to standard statistical techniques.