Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests

Barbara Pes

Resumen

Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.

Palabras claves

high-dimensional data - feature selection - class imbalance - random forest

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 8 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Journal of Science and Applicative Technology
Inteligencia Artificial

DOI

https://doi.org/10.3390/info12080286

Artículos similares

Prediction of Leaf Break Resistance of Green and Dry Alfalfa Leaves by Machine Learning Methods

Acceso

Ugur Ercan, Onder Kabas and Georgiana Moiceanu

Alfalfa holds an extremely significant place in animal nutrition when it comes to providing essential nutrients. The leaves of alfalfa specifically boast the highest nutritional value, containing a remarkable 70% of crude protein and an impressive 90% of... ver más

Revista: Applied Sciences

Optimizing Speech Emotion Recognition with Deep Learning and Grey Wolf Optimization: A Multi-Dataset Approach

Acceso

Suryakant Tyagi and Sándor Szénási

Machine learning and speech emotion recognition are rapidly evolving fields, significantly impacting human-centered computing. Machine learning enables computers to learn from data and make predictions, while speech emotion recognition allows computers t... ver más

Revista: Algorithms

Ensemble Learning-Based Coronary Artery Disease Detection Using Computer Tomography Images

Acceso

Abdul Rahaman Wahab Sait and Ali Mohammad Alorsan Bani Awad

Coronary artery disease (CAD) is the most prevalent form of cardiovascular disease that may result in myocardial infarction. Annually, it leads to millions of fatalities and causes billions of dollars in global economic losses. Limited resources and comp... ver más

Revista: Applied Sciences

Exploring EEG Emotion Recognition through Complex Networks: Insights from the Visibility Graph of Ordinal Patterns

Acceso

Longxin Yao, Yun Lu, Mingjiang Wang, Yukun Qian and Heng Li

The construction of complex networks from electroencephalography (EEG) proves to be an effective method for representing emotion patterns in affection computing as it offers rich spatiotemporal EEG features associated with brain emotions. In this paper, ... ver más

Revista: Applied Sciences

A Transfer Learning Approach Based on Radar Rainfall for River Water-Level Prediction

Acceso

Futo Ueda, Hiroto Tanouchi, Nobuyuki Egusa and Takuya Yoshihiro

River water-level prediction is crucial for mitigating flood damage caused by torrential rainfall. In this paper, we attempt to predict river water levels using a deep learning model based on radar rainfall data instead of data from upstream hydrological... ver más

Revista: Water

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas