Inicio  /  Applied Sciences  /  Vol: 13 Par: 14 (2023)  /  Artículo
ARTÍCULO
TITULO

High-Quality Data from Crowdsourcing towards the Creation of a Mexican Anti-Immigrant Speech Corpus

Alejandro Molina-Villegas    
Thomas Cattin    
Karina Gazca-Hernandez and Edwin Aldana-Bobadilla    

Resumen

Currently, a significant portion of published research on online hate speech relies on existing textual corpora. However, when examining a specific context, there is a lack of preexisting datasets that include the particularities associated with various conditions (e.g., geographic and cultural). This issue is evident in the case of online anti-immigrant speech in Mexico, where available data to study this emergent and often overlooked phenomenon are scarce. In light of this situation, we propose a novel methodology wherein three domain experts annotate a certain number of texts related to the subject. We establish a precise control mechanism based on these annotations to evaluate non-expert annotators. The evaluation of the contributors is implemented in a custom annotation platform, enabling us to conduct a controlled crowdsourcing campaign and assess the reliability of the obtained data. Our results demonstrate that a combination of crowdsourced and expert data leads to iterative improvements, not only in the accuracy achieved by various machine learning classification models (reaching 0.8828) but also in the model?s adaptation to the specific characteristics of hate speech in the Mexican Twittersphere context. In addition to these methodological innovations, the most significant contribution of our work is the creation of the first online Mexican anti-immigrant training corpus for machine-learning-based detection tasks.

 Artículos similares

       
 
Samiulhaq Wasiq and Amir Golroo    
Road networks play a significant role in each country?s economy, especially in countries such as Afghanistan, which is strategically located in the international transit path from Europe to East Asia. In such a country, pavement performance models are fu... ver más
Revista: Infrastructures

 
Jiarui Xia and Yongshou Dai    
Ground roll noise suppression is a crucial step in processing deep pre-stack seismic data. Recently, supervised deep learning methods have gained popularity in this field due to their ability to adaptively learn and extract powerful features. However, th... ver más
Revista: Applied Sciences

 
Sholpan G. Giniyatova, Rafael I. Shakirzyanov, Yuriy A. Garanin, Nurzhan A. Sailaukhanov, Artem L. Kozlovskiy, Natalia O. Volodina, Dmitriy I. Shlimas and Daryn B. Borgekov    
Ceramics based on zirconium dioxide are very important compounds for dental, implant, and structural material applications. Despite the fact that tetragonally stabilized YSZ has been well studied, the search for new compositions of zirconia-based ceramic... ver más
Revista: Applied Sciences

 
Navid Khalili Dizaji and Mustafa Dogan    
Brain tumors are one of the deadliest types of cancer. Rapid and accurate identification of brain tumors, followed by appropriate surgical intervention or chemotherapy, increases the probability of survival. Accurate determination of brain tumors in MRI ... ver más
Revista: Algorithms

 
Xuanyuan Xie and Jieyu Zhao    
The diffusion model has made progress in the field of image synthesis, especially in the area of conditional image synthesis. However, this improvement is highly dependent on large annotated datasets. To tackle this challenge, we present the Guided Diffu... ver más
Revista: Algorithms