Redirigiendo al acceso original de articulo en 18 segundos...
Inicio  /  Informatics  /  Vol: 8 Par: 1 (2021)  /  Artículo
ARTÍCULO
TITULO

The Rare Word Issue in Natural Language Generation: A Character-Based Solution

Giovanni Bonetta    
Marco Roberti    
Rossella Cancelliere and Patrick Gallinari    

Resumen

In this paper, we analyze the problem of generating fluent English utterances from tabular data, focusing on the development of a sequence-to-sequence neural model which shows two major features: the ability to read and generate character-wise, and the ability to switch between generating and copying characters from the input: an essential feature when inputs contain rare words like proper names, telephone numbers, or foreign words. Working with characters instead of words is a challenge that can bring problems such as increasing the difficulty of the training phase and a bigger error probability during inference. Nevertheless, our work shows that these issues can be solved and efforts are repaid by the creation of a fully end-to-end system, whose inputs and outputs are not constrained to be part of a predefined vocabulary, like in word-based models. Furthermore, our copying technique is integrated with an innovative shift mechanism, which enhances the ability to produce outputs directly from inputs. We assess performance on the E2E dataset, the benchmark used for the E2E NLG challenge, and on a modified version of it, created to highlight the rare word copying capabilities of our model. The results demonstrate clear improvements over the baseline and promising performance compared to recent techniques in the literature.

 Artículos similares

       
 
Agnieszka Wiszniewska-Laszczych, Joanna Szteyn, Marta Soltysiuk, Jaroslaw Kowalik and Monika Modzelewska-Kapitula    
The prevalence of staphylococci in the environment of humans, including food, may lead to the production of toxins and food poisoning in consumers. Additionally, staphylococci constitute a reservoir of genes determining antibiotic resistance. The study a... ver más
Revista: Applied Sciences

 
David Dunkerley    
The inter-tip times (ITTs) of tipping-bucket rain gauges (TBRGs) potentially provide the highest-resolution intensity data that can be acquired from this type of gauge. At an intensity of 100 mm h-1, a typical gauge with a sensitivity of 0.2 mm of rainfa... ver más
Revista: Water

 
Xinyu Wang, Hossein Ghanizadeh, Shoaib Khan, Xiaodan Wu, Haowei Li, Samreen Sadiq, Jiayin Liu, Huimin Liu and Qunfeng Yue    
Removing organic pollutants from wastewater is crucial to prevent environmental contamination and protect human health. Immobilized enzymes are increasingly being explored for wastewater treatment due to their specific catalytic activities, reusability, ... ver más
Revista: Water

 
Felipe Coelho de Abreu Pinna, Victor Takashi Hayashi, João Carlos Néto, Rosangela de Fátima Pereira Marquesone, Maísa Cristina Duarte, Rodrigo Suzuki Okada and Wilson Vicente Ruggiero    
Complex and long interactions (e.g., a change of topic during a conversation) justify the use of dialog systems to develop task-oriented chatbots and intelligent virtual assistants. The development of dialog systems requires considerable effort and takes... ver más
Revista: Applied Sciences

 
Zequan Zhao, Qiliang Zhu, Yifei Wang, Muhammad Shoaib, Xia Cao and Ning Wang    
Array-designed triboelectric nanogenerators (AD-TENGs) have firmly established themselves as state-of-the-art technologies for adeptly converting mechanical interactions into electrical signals. Central to the AD-TENG?s prowess is its inherent modularity... ver más