ARTÍCULO
TITULO

Identifying Probable Dementia in Undiagnosed Black and White Americans Using Machine Learning in Veterans Health Administration Electronic Health Records

Yijun Shao    
Kaitlin Todd    
Andrew Shutes-David    
Steven P. Millard    
Karl Brown    
Amy Thomas    
Kathryn Chen    
Katherine Wilson    
Qing T. Zeng and Debby W. Tsuang    

Resumen

The application of natural language processing and machine learning (ML) in electronic health records (EHRs) may help reduce dementia underdiagnosis, but models that are not designed to reflect minority populations may instead perpetuate underdiagnosis. To improve the identification of undiagnosed dementia, particularly in Black Americans (BAs), we developed support vector machine (SVM) ML models to assign dementia risk scores based on features identified in unstructured EHR data (via latent Dirichlet allocation and stable topic extraction in n = 1 M notes) and structured EHR data. We hypothesized that separate models would show differentiation between racial groups, so the models were fit separately for BAs (n = 5 K with dementia ICD codes, n = 5 K without) and White Americans (WAs; n = 5 K with codes, n = 5 K without). To validate our method, scores were generated for separate samples of BAs (n = 10 K) and WAs (n = 10 K) without dementia codes, and the EHRs of 1.2 K of these patients were reviewed by dementia experts. All subjects were age 65+ and drawn from the VA, which meant that the samples were disproportionately male. A strong positive relationship was observed between SVM-generated risk scores and undiagnosed dementia. BAs were more likely than WAs to have undiagnosed dementia per chart review, both overall (15.3% vs. 9.5%) and among Veterans with >90th percentile cutoff scores (25.6% vs. 15.3%). With chart reviews as the reference standard and varied cutoff scores, the BA model performed slightly better than the WA model (AUC = 0.86 with negative predictive value [NPV] = 0.98, positive predictive value [PPV] = 0.26, sensitivity = 0.61, specificity = 0.92 and accuracy = 0.91 at >90th percentile cutoff vs. AUC = 0.77 with NPV = 0.98, PPV = 0.15, sensitivity = 0.43, specificity = 0.91 and accuracy = 0.89 at >90th). Our findings suggest that race-specific ML models can help identify BAs who may have undiagnosed dementia. Future studies should examine model generalizability in settings with more females and test whether incorporating these models into clinical settings increases the referral of undiagnosed BAs to specialists.

 Artículos similares

       
 
Minmeng Tang, Tri Dev Acharya and Deb A. Niemeier    
Black carbon (BC) is a significant source of air pollution since it impacts public health and climate change. Understanding its distribution in the complex urban environment is challenging. We integrated a land use model with four machine learning models... ver más

 
José Alfredo Flores Ronces, Edith R. Salcedo Sánchez, Manuel Martínez Morales, Juan Manuel Esquivel Martínez, Oscar Talavera Mendoza and María Vicenta Esteller Alberich    
The Taxco mining district is a well-known international producer of silver, jewelry, and precious metal handicrafts. Inappropriate disposal wastes from anthropogenic activities have been deteriorating the hydric resources and threatening the inhabitants?... ver más
Revista: Water

 
Vishnupriya Jonnalagadda, Ji Yun Lee, Jie Zhao and Seyed Hooman Ghasemi    
The nation?s transportation systems are complex and are some of the highest valued and largest public assets in the United States. As a result of repeated natural hazards and their significant impact on transportation functionality and the socioeconomic ... ver más
Revista: Infrastructures

 
Alfieri Ek, Grant Drawve, Samantha Robinson and Jyotishka Datta    
Law enforcement agencies continue to grow in the use of spatial analysis to assist in identifying patterns of outcomes. Despite the critical nature of proper resource allocation for mental health incidents, there has been little progress in statistical m... ver más

 
Luke Bergmann, Luis Fernando Chaves, David O?Sullivan and Robert G. Wallace    
The spread of COVID-19 is geographically uneven in agricultural regions. Explanations proposed include differences in occupational risks, access to healthcare, racial inequalities, and approaches to public health. Here, we additionally explore the impact... ver más