REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 18 segundos...

Inicio / Applied Sciences / Vol: 10 Par: 23 (2020) / Artículo

ARTÍCULO

TITULO

Comparison of Deep Learning Models and Various Text Pre-Processing Techniques for the Toxic Comments Classification

Viera Maslej-Kre?náková

Martin Sarnovský

Peter Butka and Kristína Machová

Resumen

The emergence of anti-social behaviour in online environments presents a serious issue in today?s society. Automatic detection and identification of such behaviour are becoming increasingly important. Modern machine learning and natural language processing methods can provide effective tools to detect different types of anti-social behaviour from the pieces of text. In this work, we present a comparison of various deep learning models used to identify the toxic comments in the Internet discussions. Our main goal was to explore the effect of the data preparation on the model performance. As we worked with the assumption that the use of traditional pre-processing methods may lead to the loss of characteristic traits, specific for toxic content, we compared several popular deep learning and transformer language models. We aimed to analyze the influence of different pre-processing techniques and text representations including standard TF-IDF, pre-trained word embeddings and also explored currently popular transformer models. Experiments were performed on the dataset from the Kaggle Toxic Comment Classification competition, and the best performing model was compared with the similar approaches using standard metrics used in data analysis.

Palabras claves

natural language processing - toxic comments - classification - deep learning - neural networks

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 10 Parte: 23 (2020)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Information
Applied Sciences

DOI

https://doi.org/10.3390/app10238631

Artículos similares

Comparative Analysis of NLP-Based Models for Company Classification

Acceso

Maryan Rizinski, Andrej Jankov, Vignesh Sankaradas, Eugene Pinsky, Igor Mishkovski and Dimitar Trajanov

The task of company classification is traditionally performed using established standards, such as the Global Industry Classification Standard (GICS). However, these approaches heavily rely on laborious manual efforts by domain experts, resulting in slow... ver más

Revista: Information

Advances in Facial Expression Recognition: A Survey of Methods, Benchmarks, Models, and Datasets

Acceso

Thomas Kopalidis, Vassilios Solachidis, Nicholas Vretos and Petros Daras

Recent technological developments have enabled computers to identify and categorize facial expressions to determine a person?s emotional state in an image or a video. This process, called ?Facial Expression Recognition (FER)?, has become one of the most ... ver más

Revista: Information

A CNN-GRU Hybrid Model for Predicting Airport Departure Taxiing Time

Acceso

Ligang Yuan, Jing Liu, Haiyan Chen, Daoming Fang and Wenlu Chen

Scene taxiing time is an important indicator for assessing the operational efficiency of airports as well as green airports, and it is also a fundamental parameter in flight regularity statistics. The accurate prediction of taxiing time can help decision... ver más

Revista: Aerospace

Tibetan Sentence Boundaries Automatic Disambiguation Based on Bidirectional Encoder Representations from Transformers on Byte Pair Encoding Word Cutting Method

Acceso

Fenfang Li, Zhengzhang Zhao, Li Wang and Han Deng

Sentence Boundary Disambiguation (SBD) is crucial for building datasets for tasks such as machine translation, syntactic analysis, and semantic analysis. Currently, most automatic sentence segmentation in Tibetan adopts the methods of rule-based and stat... ver más

Revista: Applied Sciences

Introducing an Artificial Neural Network for Virtually Increasing the Sample Size of Bioequivalence Studies

Acceso

Dimitris Papadopoulos and Vangelis D. Karalis

Sample size is a key factor in bioequivalence and clinical trials. An appropriately large sample is necessary to gain valuable insights into a designated population. However, large sample sizes lead to increased human exposure, costs, and a longer time f... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas