Inicio  /  Future Internet  /  Vol: 13 Par: 11 (2021)  /  Artículo
ARTÍCULO
TITULO

Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets

Seid Muhie Yimam    
Abinew Ali Ayele    
Gopalakrishnan Venkatesh    
Ibrahim Gashaw and Chris Biemann    

Resumen

The availability of different pre-trained semantic models has enabled the quick development of machine learning components for downstream applications. However, even if texts are abundant for low-resource languages, there are very few semantic models publicly available. Most of the publicly available pre-trained models are usually built as a multilingual version of semantic models that will not fit well with the need for low-resource languages. We introduce different semantic models for Amharic, a morphologically complex Ethio-Semitic language. After we investigate the publicly available pre-trained semantic models, we fine-tune two pre-trained models and train seven new different models. The models include Word2Vec embeddings, distributional thesaurus (DT), BERT-like contextual embeddings, and DT embeddings obtained via network embedding algorithms. Moreover, we employ these models for different NLP tasks and study their impact. We find that newly-trained models perform better than pre-trained multilingual models. Furthermore, models based on contextual embeddings from FLAIR and RoBERTa perform better than word2Vec models for the NER and POS tagging tasks. DT-based network embeddings are suitable for the sentiment classification task. We publicly release all the semantic models, machine learning components, and several benchmark datasets such as NER, POS tagging, sentiment classification, as well as Amharic versions of WordSim353 and SimLex999.

 Artículos similares

       
 
Alfrendo Satyanaga, Gerarldo Davin Aventian, Yerkezhan Makenova, Aigerim Zhakiyeva, Zhuldyz Kamaliyeva, Sung-Woo Moon and Jong Kim    
BIM (Building Information Modelling) is used to create and manage data during design, construction, and operation. It helps to effectively manage resources and optimize processes in the construction industry. Geotechnical engineering is one of the comple... ver más
Revista: Infrastructures

 
Attila Csaba Marosi, Márk Emodi, Ákos Hajnal, Róbert Lovas, Tamás Kiss, Valerie Poser, Jibinraj Antony, Simon Bergweiler, Hamed Hamzeh, James Deslauriers and József Kovács    
The use of mature, reliable, and validated solutions can save significant time and cost when introducing new technologies to companies. Reference Architectures represent such best-practice techniques and have the potential to increase the speed and relia... ver más
Revista: Future Internet

 
Guannan Li, Xiu Lu, Bingxian Lin, Liangchen Zhou and Guonian Lv    
In order to realize the management of various street objects in smart cities and smart transportation, it is very important to determine their geolocation. Current positioning methods of street-view images based on mobile mapping systems (MMSs) mainly re... ver más

 
Shawal Khan, Ishita Sharma, Mazzamal Aslam, Muhammad Zahid Khan and Shahzad Khan    
A Vehicular Ad-hoc Network (VANET) comprises a group of moving or stationary vehicles connected by a wireless network. VANETs play a vital role in providing safety and comfort to drivers in vehicular environments. They provide smart traffic control and r... ver más
Revista: Future Internet

 
Swarna Bindu Chetty, Hamed Ahmadi, Sachin Sharma and Avishek Nag    
With the emergence of various types of applications such as delay-sensitive applications, future communication networks are expected to be increasingly complex and dynamic. Network Function Virtualization (NFV) provides the necessary support towards effi... ver más
Revista: Future Internet