ARTÍCULO
TITULO

Q8VaxStance: Dataset Labeling System for Stance Detection towards Vaccines in Kuwaiti Dialect

Hana Alostad    
Shoug Dawiek and Hasan Davulcu    

Resumen

The Kuwaiti dialect is a particular dialect of Arabic spoken in Kuwait; it differs significantly from standard Arabic and the dialects of neighboring countries in the same region. Few research papers with a focus on the Kuwaiti dialect have been published in the field of NLP. In this study, we created Kuwaiti dialect language resources using Q8VaxStance, a vaccine stance labeling system for a large dataset of tweets. This dataset fills this gap and provides a valuable resource for researchers studying vaccine hesitancy in Kuwait. Furthermore, it contributes to the Arabic natural language processing field by providing a dataset for developing and evaluating machine learning models for stance detection in the Kuwaiti dialect. The proposed vaccine stance labeling system combines the benefits of weak supervised learning and zero-shot learning; for this purpose, we implemented 52 experiments on 42,815 unlabeled tweets extracted between December 2020 and July 2022. The results of the experiments show that using keyword detection in conjunction with zero-shot model labeling functions is significantly better than using only keyword detection labeling functions or just zero-shot model labeling functions. Furthermore, for the total number of generated labels, the difference between using the Arabic language in both the labels and prompt or a mix of Arabic labels and an English prompt is statistically significant, indicating that it generates more labels than when using English in both the labels and prompt. The best accuracy achieved in our experiments in terms of the Macro-F1 values was found when using keyword and hashtag detection labeling functions in conjunction with zero-shot model labeling functions, specifically in experiments KHZSLF-EE4 and KHZSLF-EA1, with values of 0.83 and 0.83, respectively. Experiment KHZSLF-EE4 was able to label 42,270 tweets, while experiment KHZSLF-EA1 was able to label 42,764 tweets. Finally, the average value of annotation agreement between the generated labels and human labels ranges between 0.61 and 0.64, which is considered a good level of agreement.

 Artículos similares

       
 
Kai Zheng, Jiansheng Li, Lei Ding, Jianfeng Yang, Xucheng Zhang and Xun Zhang    
The segmentation of cloud and snow in satellite images is a key step for subsequent image analysis, interpretation, and other applications. In this paper, a cloud and snow segmentation method based on a deep convolutional neural network (DCNN) with enhan... ver más

 
Muhammed Enes Atik, Zaide Duran and Dursun Zafer Seker    
3D scene classification has become an important research field in photogrammetry, remote sensing, computer vision and robotics with the widespread usage of 3D point clouds. Point cloud classification, called semantic labeling, semantic segmentation, or s... ver más

 
Qinggang Gao, Joseph Molloy and Kay W. Axhausen    
We studied trip purpose imputation using data mining and machine learning techniques based on a dataset of GPS-based trajectories gathered in Switzerland. With a large number of labeled activities in eight categories, we explored location information usi... ver más

 
Zhen Ye, Yusheng Xu, Rong Huang, Xiaohua Tong, Xin Li, Xiangfeng Liu, Kuifeng Luan, Ludwig Hoegner and Uwe Stilla    
The semantic labeling of the urban area is an essential but challenging task for a wide variety of applications such as mapping, navigation, and monitoring. The rapid advance in Light Detection and Ranging (LiDAR) systems provides this task with a possib... ver más

 
Aleksandar Milosavljevic    
The proliferation of high-resolution remote sensing sensors and platforms imposes the need for effective analyses and automated processing of high volumes of aerial imagery. The recent advance of artificial intelligence (AI) in the form of deep learning ... ver más