Redirigiendo al acceso original de articulo en 23 segundos...
ARTÍCULO
TITULO

AI-Generated Text Detector for Arabic Language Using Encoder-Based Transformer Architecture

Hamed Alshammari    
Ahmed El-Sayed and Khaled Elleithy    

Resumen

The effectiveness of existing AI detectors is notably hampered when processing Arabic texts. This study introduces a novel AI text classifier designed specifically for Arabic, tackling the distinct challenges inherent in processing this language. A particular focus is placed on accurately recognizing human-written texts (HWTs), an area where existing AI detectors have demonstrated significant limitations. To achieve this goal, this paper utilized and fine-tuned two Transformer-based models, AraELECTRA and XLM-R, by training them on two distinct datasets: a large dataset comprising 43,958 examples and a custom dataset with 3078 examples that contain HWT and AI-generated texts (AIGTs) from various sources, including ChatGPT 3.5, ChatGPT-4, and BARD. The proposed architecture is adaptable to any language, but this work evaluates these models? efficiency in recognizing HWTs versus AIGTs in Arabic as an example of Semitic languages. The performance of the proposed models has been compared against the two prominent existing AI detectors, GPTZero and OpenAI Text Classifier, particularly on the AIRABIC benchmark dataset. The results reveal that the proposed classifiers outperform both GPTZero and OpenAI Text Classifier with 81% accuracy compared to 63% and 50% for GPTZero and OpenAI Text Classifier, respectively. Furthermore, integrating a Dediacritization Layer prior to the classification model demonstrated a significant enhancement in the detection accuracy of both HWTs and AIGTs. This Dediacritization step markedly improved the classification accuracy, elevating it from 81% to as high as 99% and, in some instances, even achieving 100%.

 Artículos similares

       
 
Davy Preuveneers and Wouter Joosen    
Ontologies have the potential to play an important role in the cybersecurity landscape as they are able to provide a structured and standardized way to semantically represent and organize knowledge about a domain of interest. They help in unambiguously m... ver más
Revista: Future Internet

 
Mahmud Hossain, Golam Kayas, Ragib Hasan, Anthony Skjellum, Shahid Noor and S. M. Riazul Islam    
Driven by the rapid escalation of its utilization, as well as ramping commercialization, Internet of Things (IoT) devices increasingly face security threats. Apart from denial of service, privacy, and safety concerns, compromised devices can be used as e... ver más
Revista: Future Internet

 
Abdul Rehman Khalid, Nsikak Owoh, Omair Uthmani, Moses Ashawa, Jude Osamor and John Adejoh    
In the era of digital advancements, the escalation of credit card fraud necessitates the development of robust and efficient fraud detection systems. This paper delves into the application of machine learning models, specifically focusing on ensemble met... ver más

 
Yongzhi Jiu, Yunfeng Gao, Fuguang Lei, Yanzhi Zhu and Zhizeng Zhang    
Stiffened deep cement mixing (SDCM) piles are composite piles that combine the advantages of single large-diameter deep cement mixing (DCM) and precast concrete piles. They comprise precast concrete piles as the core and cast-in-place DCM piles as the ou... ver más
Revista: Buildings

 
Frank Ato Ghansah and David John Edwards    
Despite the growing rich and fragmented literature focusing on quality assurance (QA) and Industry 4.0, the implementation of associated individual digital technologies has not been fully evaluated and synthesised to achieve adequate QA in the constructi... ver más
Revista: Buildings