Inicio  /  Applied Sciences  /  Vol: 11 Par: 5 (2021)  /  Artículo
ARTÍCULO
TITULO

AraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus

Ali Al-Laith    
Muhammad Shahbaz    
Hind F. Alaskar and Asim Rehmat    

Resumen

At a time when research in the field of sentiment analysis tends to study advanced topics in languages, such as English, other languages such as Arabic still suffer from basic problems and challenges, most notably the availability of large corpora. Furthermore, manual annotation is time-consuming and difficult when the corpus is too large. This paper presents a semi-supervised self-learning technique, to extend an Arabic sentiment annotated corpus with unlabeled data, named AraSenCorpus. We use a neural network to train a set of models on a manually labeled dataset containing 15,000 tweets. We used these models to extend the corpus to a large Arabic sentiment corpus called ?AraSenCorpus?. AraSenCorpus contains 4.5 million tweets and covers both modern standard Arabic and some of the Arabic dialects. The long-short term memory (LSTM) deep learning classifier is used to train and test the final corpus. We evaluate our proposed framework on two external benchmark datasets to ensure the improvement of the Arabic sentiment classification. The experimental results show that our corpus outperforms the existing state-of-the-art systems.

 Artículos similares

       
 
Olga Tushkanova, Diana Levshun, Alexander Branitskiy, Elena Fedorchenko, Evgenia Novikova and Igor Kotenko    
Cyberattacks on cyber-physical systems (CPS) can lead to severe consequences, and therefore it is extremely important to detect them at early stages. However, there are several challenges to be solved in this area; they include an ability of the security... ver más
Revista: Algorithms

 
Kokoy Siti Komariah, Ariana Tulus Purnomo, Ardianto Satriawan, Muhammad Ogin Hasanuddin, Casi Setianingsih and Bong-Kee Sin    
To pursue a healthy lifestyle, people are increasingly concerned about their food ingredients. Recently, it has become a common practice to use an online recipe to select the ingredients that match an individual?s meal plan and healthy diet preference. T... ver más
Revista: Informatics

 
Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim    
Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data dis... ver más
Revista: Applied Sciences

 
Julio Jerison E. Macrohon, Charlyn Nayve Villavicencio, X. Alphonse Inbaraj and Jyh-Horng Jeng    
With the increasing popularity of Twitter as both a social media platform and a data source for companies, decision makers, advertisers, and even researchers alike, data have been so massive that manual labeling is no longer feasible. This research uses ... ver más
Revista: Information

 
Nariman Adel Hussein, Hoda M. O. Mokhtar and Mohamed E. El-Sharkawi    
Community search is a basic problem in graph analysis. In many applications, network nodes have certain properties that are important for the community to make sense of the application; hence, attributes are associated with nodes to capture their propert... ver más
Revista: Information