A Simple Free-Text-like Method for Extracting Semi-Structured Data from Electronic Health Records: Exemplified in Prediction of In-Hospital Mortality

Eyal Klang

Matthew A. Levin

Shelly Soffer

Alexis Zebrowski

Benjamin S. Glicksberg

Brendan G. Carr

Jolion Mcgreevy

David L. Reich and Robert Freeman

Resumen

The Epic electronic health record (EHR) is a commonly used EHR in the United States. This EHR contain large semi-structured ?flowsheet? fields. Flowsheet fields lack a well-defined data dictionary and are unique to each site. We evaluated a simple free-text-like method to extract these data. As a use case, we demonstrate this method in predicting mortality during emergency department (ED) triage. We retrieved demographic and clinical data for ED visits from the Epic EHR (1/2014?12/2018). Data included structured, semi-structured flowsheet records and free-text notes. The study outcome was in-hospital death within 48 h. Most of the data were coded using a free-text-like Bag-of-Words (BoW) approach. Two machine-learning models were trained: gradient boosting and logistic regression. Term frequency-inverse document frequency was employed in the logistic regression model (LR-tf-idf). An ensemble of LR-tf-idf and gradient boosting was evaluated. Models were trained on years 2014?2017 and tested on year 2018. Among 412,859 visits, the 48-h mortality rate was 0.2%. LR-tf-idf showed AUC 0.98 (95% CI: 0.98?0.99). Gradient boosting showed AUC 0.97 (95% CI: 0.96?0.99). An ensemble of both showed AUC 0.99 (95% CI: 0.98?0.99). In conclusion, a free-text-like approach can be useful for extracting knowledge from large amounts of complex semi-structured EHR data.

Palabras claves

electronic health records - machine learning - gradient boosting

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 5 Parte: 3 (2021)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Future Internet
Big Data and Cognitive Computing

DOI