Resumen
In the construction industry, it is difficult to predict occupational accidents because various accident characteristics arise simultaneously and organically in different types of work. Furthermore, even when analyzing occupational accident data, it is difficult to deduce meaningful results because the data recorded by the incident investigator are qualitative and include a wide variety of data types and categories. Recently, numerous studies have used machine learning to analyze the correlations in such complex construction accident data; however, heretofore the focus has been on predicting severity with various variables, and several limitations remain when deriving the correlations between features from various variables. Thus, this paper proposes a data processing procedure that can efficiently manipulate accident data using optimal machine learning techniques and derive and systematize meaningful variables to rationally approach such complex problems. In particular, among the various variables, the most influential variables are derived through methods such as clustering, chi-square, Cramer?s V, and predictor importance; then, the analysis is simplified by optimally grouping the variables. For accident data with optimal variables and elements, a predictive model is constructed between variables, using a support vector machine and decision-tree-based ensemble; then, the correlation between the dependent and independent variables is analyzed through an alluvial flow diagram for several cases. Therefore, a new processing procedure has been introduced in data preprocessing and accident prediction modelling to overcome difficulties from complex and diverse construction occupational accident data, and effective accident prevention is possible by deriving correlations of construction accidents using this process.