Textual Feature Extraction Using Ant Colony Optimization for Hate Speech Classification

Shilpa Gite

Shruti Patil

Deepak Dharrao

Madhuri Yadav

Sneha Basak

Arundarasi Rajendran and Ketan Kotecha

Resumen

Feature selection and feature extraction have always been of utmost importance owing to their capability to remove redundant and irrelevant features, reduce the vector space size, control the computational time, and improve performance for more accurate classification tasks, especially in text categorization. These feature engineering techniques can further be optimized using optimization algorithms. This paper proposes a similar framework by implementing one such optimization algorithm, Ant Colony Optimization (ACO), incorporating different feature selection and feature extraction techniques on textual and numerical datasets using four machine learning (ML) models: Logistic Regression (LR), K-Nearest Neighbor (KNN), Stochastic Gradient Descent (SGD), and Random Forest (RF). The aim is to show the difference in the results achieved on both datasets with the help of comparative analysis. The proposed feature selection and feature extraction techniques assist in enhancing the performance of the machine learning model. This research article considers numerical and text-based datasets for stroke prediction and detecting hate speech, respectively. The text dataset is prepared by extracting tweets consisting of positive, negative, and neutral sentiments from Twitter API. A maximum improvement in accuracy of 10.07% is observed for Random Forest with the TF-IDF feature extraction technique on the application of ACO. Besides, this study also highlights the limitations of text data that inhibit the performance of machine learning models, justifying the difference of almost 18.43% in accuracy compared to that of numerical data.

Palabras claves

feature engineering - Term Frequency?Inverse Document Frequency (TF-IDF) - Bag of Words (BoW) - Chi-square test - Ant Colony Optimization (ACO) - machine learning

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 7 Parte: 1 (2023)

MATERIAS

INFRAESTRUCTURA

REVISTAS SIMILARES

Water
Big Data and Cognitive Computing
Future Internet

DOI