REVISTA
Applied System Innovation

TODAS

Redirigiendo al acceso original de articulo en 20 segundos...

Inicio / Applied System Innovation / Vol: 4 Par: 1 (2021) / Artículo

ARTÍCULO

TITULO

SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features

Mimi Mukherjee and Matloob Khushi

Resumen

Real-world datasets are heavily skewed where some classes are significantly outnumbered by the other classes. In these situations, machine learning algorithms fail to achieve substantial efficacy while predicting these underrepresented instances. To solve this problem, many variations of synthetic minority oversampling methods (SMOTE) have been proposed to balance datasets which deal with continuous features. However, for datasets with both nominal and continuous features, SMOTE-NC is the only SMOTE-based oversampling technique to balance the data. In this paper, we present a novel minority oversampling method, SMOTE-ENC (SMOTE?Encoded Nominal and Continuous), in which nominal features are encoded as numeric values and the difference between two such numeric values reflects the amount of change of association with the minority class. Our experiments show that classification models using the SMOTE-ENC method offer better prediction than models using SMOTE-NC when the dataset has a substantial number of nominal features and also when there is some association between the categorical features and the target class. Additionally, our proposed method addressed one of the major limitations of the SMOTE-NC algorithm. SMOTE-NC can be applied only on mixed datasets that have features consisting of both continuous and nominal features and cannot function if all the features of the dataset are nominal. Our novel method has been generalized to be applied to both mixed datasets and nominal-only datasets.

Palabras claves

SMOTE - nominal feature - continuous feature - class imbalance - precision - recall - area under receiver operating characteristic curve (ROC-AUC) - area under precision-recall curve (PR-AUC)

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 4 Parte: 1 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Algorithms
Journal of Marine Science and Engineering

DOI

https://doi.org/10.3390/asi4010018

Artículos similares

Cross-Domain Contrastive Learning-Based Few-Shot Underwater Acoustic Target Recognition

Acceso

Xiaodong Cui, Zhuofan He, Yangtao Xue, Keke Tang, Peican Zhu and Jing Han

Underwater Acoustic Target Recognition (UATR) plays a crucial role in underwater detection devices. However, due to the difficulty and high cost of collecting data in the underwater environment, UATR still faces the problem of small datasets. Few-shot le... ver más

Revista: Journal of Marine Science and Engineering

A Holistic Approach to Ransomware Classification: Leveraging Static and Dynamic Analysis with Visualization

Acceso

Bahaa Yamany, Mahmoud Said Elsayed, Anca D. Jurcut, Nashwa Abdelbaki and Marianne A. Azer

Ransomware is a type of malicious software that encrypts a victim?s files and demands payment in exchange for the decryption key. It is a rapidly growing and evolving threat that has caused significant damage and disruption to individuals and organizatio... ver más

Revista: Information

Explainable Machine Learning for Malware Detection on Android Applications

Acceso

Catarina Palma, Artur Ferreira and Mário Figueiredo

The presence of malicious software (malware), for example, in Android applications (apps), has harmful or irreparable consequences to the user and/or the device. Despite the protections app stores provide to avoid malware, it keeps growing in sophisticat... ver más

Revista: Information

IUAutoTimeSVD++: A Hybrid Temporal Recommender System Integrating Item and User Features Using a Contractive Autoencoder

Acceso

Abdelghani Azri, Adil Haddi and Hakim Allali

Collaborative filtering (CF), a fundamental technique in personalized Recommender Systems, operates by leveraging user?item preference interactions. Matrix factorization remains one of the most prevalent CF-based methods. However, recent advancements in ... ver más

Revista: Information

SFS-AGGL: Semi-Supervised Feature Selection Integrating Adaptive Graph with Global and Local Information

Acceso

Yugen Yi, Haoming Zhang, Ningyi Zhang, Wei Zhou, Xiaomei Huang, Gengsheng Xie and Caixia Zheng

As the feature dimension of data continues to expand, the task of selecting an optimal subset of features from a pool of limited labeled data and extensive unlabeled data becomes more and more challenging. In recent years, some semi-supervised feature se... ver más

Revista: Information

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas