Inicio  /  Future Internet  /  Vol: 14 Par: 3 (2022)  /  Artículo
ARTÍCULO
TITULO

A Density-Based Random Forest for Imbalanced Data Classification

Jia Dong and Quan Qian    

Resumen

Many machine learning problem domains, such as the detection of fraud, spam, outliers, and anomalies, tend to involve inherently imbalanced class distributions of samples. However, most classification algorithms assume equivalent sample sizes for each class. Therefore, imbalanced classification datasets pose a significant challenge in prediction modeling. Herein, we propose a density-based random forest algorithm (DBRF) to improve the prediction performance, especially for minority classes. DBRF is designed to recognize boundary samples as the most difficult to classify and then use a density-based method to augment them. Subsequently, two different random forest classifiers were constructed to model the augmented boundary samples and the original dataset dependently, and the final output was determined using a bagging technique. A real-world material classification dataset and 33 open public imbalanced datasets were used to evaluate the performance of DBRF. On the 34 datasets, DBRF could achieve improvements of 2?15% over random forest in terms of the F1-measure and G-mean. The experimental results proved the ability of DBRF to solve the problem of classifying objects located on the class boundary, including objects of minority classes, by taking into account the density of objects in space.

 Artículos similares

       
 
Ali Mirzaei, Hossein Bagheri and Iman Khosravi    
Crop classification using remote sensing data has emerged as a prominent research area in recent decades. Studies have demonstrated that fusing synthetic aperture radar (SAR) and optical images can significantly enhance the accuracy of classification. Ho... ver más

 
Yiliang Wan, Yuwen Fei, Rui Jin, Tao Wu and Xinguang He    
The effective extraction of impervious surfaces is critical to monitor their expansion and ensure the sustainable development of cities. Open geographic data can provide a large number of training samples for machine learning methods based on remote-sens... ver más

 
Viera Maslej-Kre?náková, Martin Sarnovský and Júlia Jacková    
The work presented in this paper focuses on the use of data augmentation techniques applied in the domain of the detection of antisocial behavior. Data augmentation is a frequently used approach to overcome issues related to the lack of data or problems ... ver más
Revista: Future Internet

 
Fuan Tsai, Jhe-Syuan Lai, Kieu Anh Nguyen and Walter Chen    
The universal soil loss equation (USLE) is a widely used empirical model for estimating soil loss. Among the USLE model factors, the cover management factor (C-factor) is a critical factor that substantially impacts the estimation result. Assigning C-fac... ver más

 
Huan Ning, Zhenlong Li, Michael E. Hodgson and Cuizhen (Susan) Wang    
This article aims to implement a prototype screening system to identify flooding-related photos from social media. These photos, associated with their geographic locations, can provide free, timely, and reliable visual information about flood events to t... ver más