Resampling Imbalanced Network Intrusion Datasets to Identify Rare Attacks

Sikha Bagui

Dustin Mink

Subhash Bagui

Sakthivel Subramaniam and Daniel Wallace

Resumen

This study, focusing on identifying rare attacks in imbalanced network intrusion datasets, explored the effect of using different ratios of oversampled to undersampled data for binary classification. Two designs were compared: random undersampling before splitting the training and testing data and random undersampling after splitting the training and testing data. This study also examines how oversampling/undersampling ratios affect random forest classification rates in datasets with minority dataor rare attacks. The results suggest that random undersampling before splitting gives better classification rates; however, random undersampling after oversampling with BSMOTE allows for the use of lower ratios of oversampled data.

Palabras claves

imbalanced data - resampling - rare attacks - network intrusion datasets - minority data - oversampling - BSMOTE - random undersampling - random forest

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 15 Parte: 4 (2023)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Future Internet
Big Data and Cognitive Computing
Infrastructures

DOI