Resumen
Student dropout is a serious issue in that it not only affects the individual students who drop out but also has negative impacts on the former university, family, and society together. To resolve this, various attempts have been made to predict student dropout using machine learning. This paper presents a model to predict student dropout at Sahmyook University using machine learning. Academic records collected from 20,050 students of the university were analyzed and used for learning. Various machine learning algorithms were used to implement the model, including Logistic Regression, Decision Tree, Random Forest, Support Vector Machine, Deep Neural Network, and LightGBM (Light Gradient Boosting Machine), and their performances were compared through experiments. We also discuss the influence of oversampling used to resolve data imbalance issues in the dropout data. For this purpose, various oversampling algorithms such as SMOTE, ADASYN, and Borderline-SMOTE were tested. Our experimental results showed that the proposed model implemented using LightGBM provided the best performance with an F1-score of 0.840, which is higher than the results of previous studies discussing the dropout prediction with the issue of class imbalance.