Resumen
Parkinson?s disease is a neurodegenerative disease that is associated with genetic and environmental factors. However, the genes causing this degeneration have not been determined, and no reported cure exists for this disease. Recently, studies have been conducted to classify diseases with RNA-seq data using machine learning, and accurate diagnosis of diseases using machine learning is becoming an important task. In this study, we focus on how various feature selection methods can improve the performance of machine learning for accurate diagnosis of Parkinson?s disease. In addition, we analyzed the performance metrics and computational costs of running the model with and without various feature selection methods. Experiments were conducted using RNA sequencing?a technique that analyzes the transcription profiling of organisms using next-generation sequencing. Genetic algorithms (GA), information gain (IG), and wolf search algorithm (WSA) were employed as feature selection methods. Machine learning algorithms?extreme gradient boosting (XGBoost), deep neural network (DNN), support vector machine (SVM), and decision tree (DT)?were used as classifiers. Further, the model was evaluated using performance indicators, such as accuracy, precision, recall, F1" role="presentation">??1F1
F
1
score, and receiver operating characteristic (ROC) curve. For XGBoost and DNN, feature selection methods based on GA, IG, and WSA improved the performance of machine learning by 10.00% and 38.18%, respectively. For SVM and DT, performance was improved by 0.91% and 7.27%, respectively, with feature selection methods based on IG and WSA. The results demonstrate that various feature selection methods improve the performance of machine learning when classifying Parkinson?s disease using RNA-seq data.