Resumen
The SECOM dataset contains information about a semiconductor production line, entailing the products that failed the in-house test line and their attributes. This dataset, similar to most semiconductor manufacturing data, contains missing values, imbalanced classes, and noisy features. In this work, the challenges of this dataset are met and many different approaches for classification are evaluated to perform fault diagnosis. We present an experimental evaluation that examines 288 combinations of different approaches involving data pruning, data imputation, feature selection, and classification methods, to find the suitable approaches for this task. Furthermore, a novel data imputation approach, namely ?In-painting KNN-Imputation? is introduced and is shown to outperform the common data imputation technique. The results show the capability of each classifier, feature selection method, data generation method, and data imputation technique, with a full analysis of their respective parameter optimizations.