Resumen
Predicting ICU readmission risk will help physicians make decisions regarding discharge. We used discharge summaries to predict ICU 30-day readmission risk using text mining and machine learning (ML) with data from the Medical Information Mart for Intensive Care III (MIMIC-III). We used Natural Language Processing (NLP) and the Bag-of-Words approach on discharge summaries to build a Document-Term-Matrix with 3000 features. We compared the performance of support vector machines with the radial basis function kernel (SVM-RBF), adaptive boosting (AdaBoost), quadratic discriminant analysis (QDA), least absolute shrinkage and selection operator (LASSO), and Ridge Regression. A total of 4000 patients were used for model training and 6000 were used for validation. Using the bag-of-words determined by NLP, the area under the receiver operating characteristic (AUROC) curve was 0.71, 0.68, 0.65, 0.69, and 0.65 correspondingly for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. We then used the SVM-RBF model for feature selection by incrementally adding features to the model from 1 to 3000 bag-of-words. Through this exhaustive search approach, only 825 features (words) were dominant. Using those selected features, we trained and validated all ML models. The AUROC curve was 0.74, 0.69, 0.67, 0.70, and 0.71 respectively for SVM-RBF, AdaBoost, QDA, LASSO, and Ridge Regression. Overall, this technique could predict ICU readmission relatively well.