Redirigiendo al acceso original de articulo en 23 segundos...
Inicio  /  Information  /  Vol: 13 Par: 10 (2022)  /  Artículo
ARTÍCULO
TITULO

Zero-Shot Topic Labeling for Hazard Classification

Andrea Rondinelli    
Lorenzo Bongiovanni and Valerio Basile    

Resumen

Topic classification is the task of mapping text onto a set of meaningful labels known beforehand. This scenario is very common both in academia and industry whenever there is the need of categorizing a big corpus of documents according to set custom labels. The standard supervised approach, however, requires thousands of documents to be manually labelled, and additional effort every time the label taxonomy changes. To obviate these downsides, we investigated the application of a zero-shot approach to topic classification. In this setting, a subset of these topics, or even all of them, is not seen at training time, challenging the model to classify corresponding examples using additional information. We first show how zero-shot classification can perform the topic-classification task without any supervision. Secondly, we build a novel hazard-detection dataset by manually selecting tweets gathered by LINKS Foundation for this task, where we demonstrate the effectivenes of our cost-free method on a real-world problem. The idea is to leverage a pre-trained text-embedder (MPNet) to map both text and topics into the same semantic vector space where they can be compared. We demonstrate that these semantic spaces are better aligned when their dimension is reduced, keeping only the most useful information. We investigated three different dimensionality reduction techniques, namely, linear projection, autoencoding and PCA. Using the macro F1-score as the standard metric, it was found that PCA is the best performing technique, recording improvements for each dataset in comparison with the performance on the baseline.

 Artículos similares

       
 
Khaled Arbateni and Amir Benzaoui    
Electrocardiography (ECG) is a simple and safe tool for detecting heart conditions. Despite the diaspora of existing heartbeat classifiers, improvements such as real-time heartbeat identification and patient-independent classification persist. Reservoir ... ver más
Revista: Applied Sciences

 
Xingxing Tong, Ming Chen and Guofu Feng    
The issue of aquatic product quality and safety has gradually become a focal point of societal concern. Analyzing textual comments from people about aquatic products aids in promptly understanding the current sentiment landscape regarding the quality and... ver más
Revista: Applied Sciences

 
Waseem Abbas, Zuping Zhang, Muhammad Asim, Junhong Chen and Sadique Ahmad    
In the ever-expanding online fashion market, businesses in the clothing sales sector are presented with substantial growth opportunities. To utilize this potential, it is crucial to implement effective methods for accurately identifying clothing items. T... ver más
Revista: Information

 
Wandile Nhlapho, Marcellin Atemkeng, Yusuf Brima and Jean-Claude Ndogmo    
The advent of deep learning (DL) has revolutionized medical imaging, offering unprecedented avenues for accurate disease classification and diagnosis. DL models have shown remarkable promise for classifying brain tumors from Magnetic Resonance Imaging (M... ver más
Revista: Information

 
Luis M. de Campos, Juan M. Fernández-Luna, Juan F. Huete, Francisco J. Ribadas-Pena and Néstor Bolaños    
In the context of academic expert finding, this paper investigates and compares the performance of information retrieval (IR) and machine learning (ML) methods, including deep learning, to approach the problem of identifying academic figures who are expe... ver más
Revista: Algorithms