Inicio  /  Applied Sciences  /  Vol: 13 Par: 7 (2023)  /  Artículo
ARTÍCULO
TITULO

A Neural Topic Modeling Study Integrating SBERT and Data Augmentation

Huaqing Cheng    
Shengquan Liu    
Weiwei Sun and Qi Sun    

Resumen

Topic models can extract consistent themes from large corpora for research purposes. In recent years, the combination of pretrained language models and neural topic models has gained attention among scholars. However, this approach has some drawbacks: in short texts, the quality of the topics obtained by the models is low and incoherent, which is caused by the reduced word frequency (insufficient word co-occurrence) in short texts compared to long texts. To address these issues, we propose a neural topic model based on SBERT and data augmentation. First, our proposed easy data augmentation (EDA) method with keyword combination helps overcome the sparsity problem in short texts. Then, the attention mechanism is used to focus on keywords related to the topic and reduce the impact of noise words. Next, the SBERT model is trained on a large and diverse dataset, which can generate high-quality semantic information vectors for short texts. Finally, we perform feature fusion on the augmented data that have been weighted by an attention mechanism with the high-quality semantic information obtained. Then, the fused features are input into a neural topic model to obtain high-quality topics. The experimental results on an English public dataset show that our model generates high-quality topics, with the average scores improving by 2.5% for topic coherence and 1.2% for topic diversity compared to the baseline model.

 Artículos similares

       
 
Carlos Vargas and Hiram Ponce    
In this paper we propose the Recurrent Embedded Topic Model (RETM) which is a modification of the Embedded Topic Modelling (ETM) by reusing the Continuous Bag of Words (CBOW) that the model had implemented and applying it to a recurrent neural network (L... ver más
Revista: Applied Sciences

 
Xiong Chen and Ping Guo    
As a novel biological computing device, the Spiking Neural P system (SNPS) has powerful computing potential. The application of SNPS in the field of arithmetic operation has been a hot research topic in recent years. Researchers have proposed methods and... ver más
Revista: Applied Sciences

 
Hamed Taherdoost    
Network analysis aids management in reducing overall expenditures and maintenance workload. Social media platforms frequently use neural networks to suggest material that corresponds with user preferences. Machine learning is one of many methods for soci... ver más
Revista: Algorithms

 
Juan Contreras-Castillo, Juan Antonio Guerrero-Ibañez, Pedro C. Santana-Mancilla and Luis Anido-Rifón    
The Internet of Things (IoT) and convolutional neural networks (CNN) integration is a growing topic of interest for researchers as a technology that will contribute to transforming agriculture. IoT will enable farmers to decide and act based on data coll... ver más
Revista: Applied Sciences

 
Shao-Ming Lee and Ja-Ling Wu    
Recently, federated learning (FL) has gradually become an important research topic in machine learning and information theory. FL emphasizes that clients jointly engage in solving learning tasks. In addition to data security issues, fundamental challenge... ver más
Revista: Information