Redirigiendo al acceso original de articulo en 18 segundos...
Inicio  /  Applied Sciences  /  Vol: 12 Par: 23 (2022)  /  Artículo
ARTÍCULO
TITULO

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

Tahani Alqurashi    

Resumen

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency?inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters.

 Artículos similares

       
 
Li Li and Kyung Soo Jun    
River flood routing computes changes in the shape of a flood wave over time as it travels downstream along a river. Conventional flood routing models, especially hydrodynamic models, require a high quality and quantity of input data, such as measured hyd... ver más
Revista: Water

 
Subin Kim, Heejin Hwang, Keunyeong Oh and Jiuk Shin    
The seismically deficient column details in existing reinforced concrete buildings affect the overall behavior of the building depending on the failure type of the column. The purpose of this study is to develop and validate a machine-learning-based pred... ver más
Revista: Applied Sciences

 
Jiaming Li, Ning Xie and Tingting Zhao    
In recent years, with the rapid advancements in Natural Language Processing (NLP) technologies, large models have become widespread. Traditional reinforcement learning algorithms have also started experimenting with language models to optimize training. ... ver más
Revista: Algorithms

 
Peranut Nimitsurachat and Peter Washington    
Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a m... ver más
Revista: AI

 
Zhifu Lin, Dasheng Xiao and Hong Xiao    
Flow through complex thermodynamic machinery is intricate, incorporating turbulence, compressibility effects, combustion, and solid?fluid interactions, posing a challenge to classical physics. For example, it is not currently possible to simulate a three... ver más
Revista: Aerospace