Inicio  /  Information  /  Vol: 12 Par: 8 (2021)  /  Artículo
ARTÍCULO
TITULO

Learning from High-Dimensional and Class-Imbalanced Datasets Using Random Forests

Barbara Pes    

Resumen

Class imbalance and high dimensionality are two major issues in several real-life applications, e.g., in the fields of bioinformatics, text mining and image classification. However, while both issues have been extensively studied in the machine learning community, they have mostly been treated separately, and little research has been thus far conducted on which approaches might be best suited to deal with datasets that are class-imbalanced and high-dimensional at the same time (i.e., with a large number of features). This work attempts to give a contribution to this challenging research area by studying the effectiveness of hybrid learning strategies that involve the integration of feature selection techniques, to reduce the data dimensionality, with proper methods that cope with the adverse effects of class imbalance (in particular, data balancing and cost-sensitive methods are considered). Extensive experiments have been carried out across datasets from different domains, leveraging a well-known classifier, the Random Forest, which has proven to be effective in high-dimensional spaces and has also been successfully applied to imbalanced tasks. Our results give evidence of the benefits of such a hybrid approach, when compared to using only feature selection or imbalance learning methods alone.

 Artículos similares

       
 
Ugur Ercan, Onder Kabas and Georgiana Moiceanu    
Alfalfa holds an extremely significant place in animal nutrition when it comes to providing essential nutrients. The leaves of alfalfa specifically boast the highest nutritional value, containing a remarkable 70% of crude protein and an impressive 90% of... ver más
Revista: Applied Sciences

 
Suryakant Tyagi and Sándor Szénási    
Machine learning and speech emotion recognition are rapidly evolving fields, significantly impacting human-centered computing. Machine learning enables computers to learn from data and make predictions, while speech emotion recognition allows computers t... ver más
Revista: Algorithms

 
Abdul Rahaman Wahab Sait and Ali Mohammad Alorsan Bani Awad    
Coronary artery disease (CAD) is the most prevalent form of cardiovascular disease that may result in myocardial infarction. Annually, it leads to millions of fatalities and causes billions of dollars in global economic losses. Limited resources and comp... ver más
Revista: Applied Sciences

 
Longxin Yao, Yun Lu, Mingjiang Wang, Yukun Qian and Heng Li    
The construction of complex networks from electroencephalography (EEG) proves to be an effective method for representing emotion patterns in affection computing as it offers rich spatiotemporal EEG features associated with brain emotions. In this paper, ... ver más
Revista: Applied Sciences

 
Futo Ueda, Hiroto Tanouchi, Nobuyuki Egusa and Takuya Yoshihiro    
River water-level prediction is crucial for mitigating flood damage caused by torrential rainfall. In this paper, we attempt to predict river water levels using a deep learning model based on radar rainfall data instead of data from upstream hydrological... ver más
Revista: Water