Redirigiendo al acceso original de articulo en 16 segundos...
Inicio  /  Algorithms  /  Vol: 15 Par: 12 (2022)  /  Artículo
ARTÍCULO
TITULO

RoSummary: Control Tokens for Romanian News Summarization

Mihai Alexandru Niculescu    
Stefan Ruseti and Mihai Dascalu    

Resumen

Significant progress has been achieved in text generation due to recent developments in neural architectures; nevertheless, this task remains challenging, especially for low-resource languages. This study is centered on developing a model for abstractive summarization in Romanian. A corresponding dataset for summarization is introduced, followed by multiple models based on the Romanian GPT-2, on top of which control tokens were considered to specify characteristics for the generated text, namely: counts of sentences and words, token ratio, and n-gram overlap. These are special tokens defined in the prompt received by the model to indicate traits for the text to be generated. The initial model without any control tokens was assessed using BERTScore (F1" role="presentation">F1F1 F 1 = 73.43%) and ROUGE (ROUGE-L accuracy = 34.67%). Control tokens improved the overall BERTScore to 75.42% using , while the model was influenced more by the second token specified in the prompt when performing various combinations of tokens. Six raters performed human evaluations of 45 generated summaries with different models and decoding methods. The generated texts were all grammatically correct and consistent in most cases, while the evaluations were promising in terms of main idea coverage, details, and cohesion. Paraphrasing still requires improvements as the models mostly repeat information from the reference text. In addition, we showcase an exploratory analysis of the generated summaries using one or two specific control tokens.

 Artículos similares

       
 
Teerapong Panboonyuen, Sittinun Thongbai, Weerachai Wongweeranimit, Phisan Santitamnont, Kittiwan Suphan and Chaiyut Charoenphon    
Due to the various sizes of each object, such as kilometer stones, detection is still a challenge, and it directly impacts the accuracy of these object counts. Transformers have demonstrated impressive results in various natural language processing (NLP)... ver más
Revista: Information

 
Carlos Azevedo, António Matos, Pedro U. Lima and Jose Avendaño    
Currently, there is a lack of developer-friendly software tools to formally address multi-robot coordination problems and obtain robust, efficient, and predictable strategies. This paper introduces a software toolbox that encapsulates, in one single pack... ver más
Revista: Applied Sciences

 
Guizhe Song, Degen Huang and Zhifeng Xiao    
Multilingual characteristics, lack of annotated data, and imbalanced sample distribution are the three main challenges for toxic comment analysis in a multilingual setting. This paper proposes a multilingual toxic text classifier which adopts a novel fus... ver más
Revista: Information

 
Rashel Fam and Yves Lepage    
In this paper, we inspect the theoretical problem of counting the number of analogies between sentences contained in a text. Based on this, we measure the analogical density of the text. We focus on analogy at the sentence level, based on the level of fo... ver más
Revista: Information

 
Giulio Carducci, Giuseppe Rizzo, Diego Monti, Enrico Palumbo and Maurizio Morisio    
We are what we do, like, and say. Numerous research efforts have been pushed towards the automatic assessment of personality dimensions relying on a set of information gathered from social media platforms such as list of friends, interests of musics and ... ver más
Revista: Information