REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 22 segundos...

Inicio / Applied Sciences / Vol: 11 Par: 13 (2021) / Artículo

ARTÍCULO

TITULO

Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network

Adam Wawrzynski and Julian Szymanski

Resumen

To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches based on neural networks. We describe in detail nine different algorithms used for text representation and then we evaluate five diverse datasets: BBCSport, BBC, Ohsumed, 20Newsgroups, and Reuters. The selected statistical models include Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TFIDF) weighting, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). For the second group of deep neural networks, Partition-Smooth Inverse Frequency (P-SIF), Doc2Vec-Distributed Bag of Words Paragraph Vector (Doc2Vec-DBoW), Doc2Vec-Memory Model of Paragraph Vectors (Doc2Vec-DM), Hierarchical Attention Network (HAN) and Longformer were selected. The text representation methods were benchmarked in the document classification task and BoW and TFIDF models were used were used as a baseline. Based on the identified weaknesses of the HAN method, an improvement in the form of a Hierarchical Weighted Attention Network (HWAN) was proposed. The incorporation of statistical features into HAN latent representations improves or provides comparable results on four out of five datasets. The article presents how the length of the processed text affects the results of HAN and variants of HWAN models.

Palabras claves

natural language processing - text representation - document classification - deep learning

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 11 Parte: 13 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Water
Inteligencia Artificial
Applied System Innovation

DOI

https://doi.org/10.3390/app11136113

Artículos similares

Temporal and Spatial Characteristics of Precipitation and Temperature in Punjab, Pakistan

Acceso

Zain Nawaz, Xin Li, Yingying Chen, Yanlong Guo, Xufeng Wang and Naima Nawaz

Identifying the changes in precipitation and temperature at a regional scale is of great importance for the quantification of climate change. This research investigates the changes in precipitation and surface air temperature indices in the seven irrigat... ver más

Revista: Water

Priority Pollutants in Water and Sediments of a River for Control Basing on Benthic Macroinvertebrate Community Structure

Acceso

Xiang Liu, Jin Zhang, Wenqing Shi, Min Wang, Kai Chen and Li Wang

Understanding the drivers of macroinvertebrate community structure is fundamental for adequately controlling pollutants and managing ecosystems under global change. In this study, the abundance and diversity of benthic macroinvertebrates, as well as thei... ver más

Revista: Water

Comparative Study of Clustering Algorithms using OverallSimSUX Similarity Function for XML Documents

Acceso

Damny Magdaleno Guevara, Yadriel Miranda, Ivett Fuentes, María Garc ía Pág. 69 - 80

A huge amount of information is represented in XML format. Several tools have been developed to store, and query XML data. It becomes inevitable to develop high performance techniques for efficiently analysing extremely large collections of XML data. O... ver más

Revista: Inteligencia Artificial

Industry 4.0 and Smart Systems in Manufacturing: Guidelines for the Implementation of a Smart Statistical Process Control

Acceso

Lucas Schmidt Goecks, Anderson Felipe Habekost, Antonio Maria Coruzzolo and Miguel Afonso Sellitto

Digital transformations in manufacturing systems confer advantages for enhancing competitiveness and ensuring the survival of companies by reducing operating costs, improving quality, and fostering innovation, falling within the overarching umbrella of I... ver más

Revista: Applied System Innovation

A Sustainability Approach between the Water?Energy?Food Nexus and Clean Energy

Acceso

Gricelda Herrera-Franco, Lady Bravo-Montero, Jhon Caicedo-Potosí and Paúl Carrión-Mero

The excessive use of energy from fossil fuels, which corresponds to population, industrialisation, and unsustainable economic growth, is the cause of carbon dioxide production and climate change. The Water?Energy?Food (WEF) nexus is an applicable concept... ver más

Revista: Water

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas