Resumen
Gestational diabetes mellitus (GDM) is a common condition with repercussions for both the mother and her child. Machine learning (ML) modeling techniques were proposed to predict the risk of several medical outcomes. A systematic evaluation of the predictive capacity of maternal factors resulting in GDM in the UAE is warranted. Data on a total of 3858 women who gave birth and had information on their GDM status in a birth cohort were used to fit the GDM risk prediction model. Information used for the predictive modeling were from self-reported epidemiological data collected at early gestation. Three different ML models, random forest (RF), gradient boosting model (GBM), and extreme gradient boosting (XGBoost), were used to predict GDM. Furthermore, to provide local interpretation of each feature in GDM diagnosis, features were studied using Shapley additive explanations (SHAP). Results obtained using ML models show that XGBoost, which achieved an AUC of 0.77, performed better compared to RF and GBM. Individual feature importance using SHAP value and the XGBoost model show that previous GDM diagnosis, maternal age, body mass index, and gravidity play a vital role in GDM diagnosis. ML models using self-reported epidemiological data are useful and feasible in prediction models for GDM diagnosis amongst pregnant women. Such data should be periodically collected at early pregnancy for health professionals to intervene at earlier stages to prevent adverse outcomes in pregnancy and delivery. The XGBoost algorithm was the optimal model for identifying the features that predict GDM diagnosis.