Redirigiendo al acceso original de articulo en 18 segundos...
Inicio  /  Applied Sciences  /  Vol: 12 Par: 21 (2022)  /  Artículo
ARTÍCULO
TITULO

Identification and Visualization of Key Topics in Scientific Publications with Transformer-Based Language Models and Document Clustering Methods

Min-Hsien Weng    
Shaoqun Wu and Mark Dyer    

Resumen

With the rapidly growing number of scientific publications, researchers face an increasing challenge of discovering the current research topics and methodologies in a scientific domain. This paper describes an unsupervised topic detection approach that utilizes the new development of transformer-based GPT-3 (Generative Pretrained Transformer 3) similarity embedding models and modern document clustering techniques. In total, 593 publication abstracts across urban study and machine learning domains were used as a case study to demonstrate the three phases of our approach. The iterative clustering phase uses the GPT-3 embeddings to represent the semantic meaning of abstracts and deploys the HDBSCAN (Hierarchical Density-based Spatial Clustering of Applications with Noise) clustering algorithm along with silhouette scores to group similar abstracts. The keyword extraction phase identifies candidate words from each abstract and selects keywords using the Maximal Marginal Relevance ranking algorithm. The keyword grouping phase produces the keyword groups to represent topics in each abstract cluster, again using GPT-3 embeddings, the HDBSCAN algorithm, and silhouette scores. The results are visualized in a web-based interactive tool that allows users to explore abstract clusters and examine the topics in each cluster through keyword grouping. Our unsupervised topic detection approach does not require labeled datasets for training and has the potential to be used in bibliometric analysis in a large collection of publications.

 Artículos similares

       
 
Nan Xu, Zhiming Zhang and Yongming Liu    
Structural Health Monitoring requires the continuous assessment of a structure?s operational conditions, which involves the collection and analysis of a large amount of data in both spatial and temporal domains. Conventionally, both data-driven and physi... ver más
Revista: Infrastructures

 
Yong Qi, Mengzhe Qiu, Hefeifei Jiang and Feiyang Wang    
The fingerprint is an important biological feature of the human body, which contains abundant biometric information. At present, the academic exploration of fingerprint gender characteristics is generally at the level of understanding, and the standardiz... ver más
Revista: Applied Sciences

 
Chengyin Ru, Shihai Zhang, Chongnian Qu and Zimiao Zhang    
Aiming at the application of the overhead transmission line insulator patrol inspection requirements based on the unmanned aerial vehicle (UAV), a lightweight ECA-YOLOX-Tiny model is proposed by embedding the efficient channel attention (ECA) module into... ver más
Revista: Applied Sciences

 
Naif Almakayeel, Salil Desai, Saleh Alghamdi and Mohamed Rafik Noor Mohamed Qureshi    
The development of Cyber-Physical Systems (CPS) and the Internet of Things (IoT) has influenced Cyber-Physical Manufacturing Systems (CPMS). Collaborative manufacturing among organizations with geographically distributed operations using Nanomanufacturin... ver más
Revista: Applied Sciences

 
Yoon-Ji Kim, Jeong-Seok Lee, Alessandro Pititto, Luigi Falco, Moon-Suk Lee, Kyoung-Kuk Yoon and Ik-Soon Cho    
For developing national maritime traffic routes through the coastal waters of Korea, the customary maritime traffic flow must be accurately identified and quantitatively evaluated. In this study, the occupancy time of ships in cells was calculated throug... ver más
Revista: Applied Sciences