A Density-Based Random Forest for Imbalanced Data Classification

Jia Dong and Quan Qian

Resumen

Many machine learning problem domains, such as the detection of fraud, spam, outliers, and anomalies, tend to involve inherently imbalanced class distributions of samples. However, most classification algorithms assume equivalent sample sizes for each class. Therefore, imbalanced classification datasets pose a significant challenge in prediction modeling. Herein, we propose a density-based random forest algorithm (DBRF) to improve the prediction performance, especially for minority classes. DBRF is designed to recognize boundary samples as the most difficult to classify and then use a density-based method to augment them. Subsequently, two different random forest classifiers were constructed to model the augmented boundary samples and the original dataset dependently, and the final output was determined using a bagging technique. A real-world material classification dataset and 33 open public imbalanced datasets were used to evaluate the performance of DBRF. On the 34 datasets, DBRF could achieve improvements of 2?15% over random forest in terms of the F1-measure and G-mean. The experimental results proved the ability of DBRF to solve the problem of classifying objects located on the class boundary, including objects of minority classes, by taking into account the density of objects in space.

Palabras claves

density-based random forest - imbalanced data classification - boundary and density domain partition

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 3 (2022)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

ISPRS International Journal of Geo-Information
Future Internet
Big Data and Cognitive Computing

DOI

https://doi.org/10.3390/fi14030090

Artículos similares

Enhancing Crop Classification Accuracy through Synthetic SAR-Optical Data Generation Using Deep Learning

Acceso

Ali Mirzaei, Hossein Bagheri and Iman Khosravi

Crop classification using remote sensing data has emerged as a prominent research area in recent decades. Studies have demonstrated that fusing synthetic aperture radar (SAR) and optical images can significantly enhance the accuracy of classification. Ho... ver más

Revista: ISPRS International Journal of Geo-Information

An Object-Oriented Deep Multi-Sphere Support Vector Data Description Method for Impervious Surfaces Extraction Based on Multi-Sourced Data

Acceso

Yiliang Wan, Yuwen Fei, Rui Jin, Tao Wu and Xinguang He

The effective extraction of impervious surfaces is critical to monitor their expansion and ensure the sustainable development of cities. Open geographic data can provide a large number of training samples for machine learning methods based on remote-sens... ver más

Revista: ISPRS International Journal of Geo-Information

Use of Data Augmentation Techniques in Detection of Antisocial Behavior Using Deep Learning Methods

Acceso

Viera Maslej-Kre?náková, Martin Sarnovský and Júlia Jacková

The work presented in this paper focuses on the use of data augmentation techniques applied in the domain of the detection of antisocial behavior. Data augmentation is a frequently used approach to overcome issues related to the lack of data or problems ... ver más

Revista: Future Internet

Determining Cover Management Factor with Remote Sensing and Spatial Analysis for Improving Long-Term Soil Loss Estimation in Watersheds

Acceso

Fuan Tsai, Jhe-Syuan Lai, Kieu Anh Nguyen and Walter Chen

The universal soil loss equation (USLE) is a widely used empirical model for estimating soil loss. Among the USLE model factors, the cover management factor (C-factor) is a critical factor that substantially impacts the estimation result. Assigning C-fac... ver más

Revista: ISPRS International Journal of Geo-Information

Prototyping a Social Media Flooding Photo Screening System Based on Deep Learning

Acceso

Huan Ning, Zhenlong Li, Michael E. Hodgson and Cuizhen (Susan) Wang

This article aims to implement a prototype screening system to identify flooding-related photos from social media. These photos, associated with their geographic locations, can provide free, timely, and reliable visual information about flood events to t... ver más

Revista: ISPRS International Journal of Geo-Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas