REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 15 segundos...

Inicio / Applied Sciences / Vol: 10 Par: 23 (2020) / Artículo

ARTÍCULO

TITULO

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification

Viera Maslej-Kre?náková

Martin Sarnovský

Peter Butka and Kristína Machová

Resumen

The emergence of anti-social behaviour in online environments presents a serious issue in today?s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis.

Palabras claves

natural language processing - toxic comments - classification - deep learning - neural networks

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 10 Parte: 23 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Information
Applied Sciences

DOI

https://doi.org/10.3390/app10238631

Artículos similares

Comparative Analysis of NLP-Based Models for Company Classification

Acceso

Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver más

Revista: Information

Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets

Acceso

Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras

Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person?s emotional state in an image or a video. This process, called ?Facial Expression Recognition (FER)?, has become one of the most ... ver más

Revista: Information

NSGA?III?XGBoost-Based Stochastic Reliability Analysis of Deep Soft Rock Tunnel

Acceso

Jiancong Xu, Chen Sun and Guorong Rui

How to evaluate the reliability of deep soft rock tunnels under high stress is a very important problem to be solved. In this paper, we proposed a practical stochastic reliability method based on the third-generation non-dominated sorting genetic algorit... ver más

Revista: Applied Sciences

A CNN-GRU Hybrid Model for Predicting Airport Departure Taxiing Time

Acceso

Ligang Yuan, Jing Liu, Haiyan Chen, Daoming Fang and Wenlu Chen

Scene taxiing time is an important indicator for assessing the operational efficiency of airports as well as green airports, and it is also a fundamental parameter in flight regularity statistics. The accurate prediction of taxiing time can help decision... ver más

Revista: Aerospace

Downscaling Daily Reference Evapotranspiration Using a Super-Resolution Convolutional Transposed Network

Acceso

Yong Liu, Xiaohui Yan, Wenying Du, Tianqi Zhang, Xiaopeng Bai and Ruichuan Nan

The current work proposes a novel super-resolution convolutional transposed network (SRCTN) deep learning architecture for downscaling daily climatic variables. The algorithm was established based on a super-resolution convolutional neural network with t... ver más

Revista: Water

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas