Inicio  /  Applied Sciences  /  Vol: 13 Par: 2 (2023)  /  Artículo
ARTÍCULO
TITULO

Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data

Atnafu Lambebo Tonja    
Olga Kolesnikova    
Alexander Gelbukh and Grigori Sidorov    

Resumen

Despite the many proposals to solve the neural machine translation (NMT) problem of low-resource languages, it continues to be difficult. The issue becomes even more complicated when few resources cover only a single domain. In this paper, we discuss the applicability of a source-side monolingual dataset of low-resource languages to improve the NMT system for such languages. In our experiments, we used Wolaytta?English translation as a low-resource language. We discuss the use of self-learning and fine-tuning approaches to improve the NMT system for Wolaytta?English translation using both authentic and synthetic datasets. The self-learning approach showed +2.7 and +2.4 BLEU score improvements for Wolaytta?English and English?Wolaytta translations, respectively, over the best-performing baseline model. Further fine-tuning the best-performing self-learning model showed +1.2 and +0.6 BLEU score improvements for Wolaytta?English and English?Wolaytta translations, respectively. We reflect on our contributions and plan for the future of this difficult field of study.

 Artículos similares

       
 
Wenbo Zhang, Xiao Li, Yating Yang and Rui Dong    
The pre-training fine-tuning mode has been shown to be effective for low resource neural machine translation. In this mode, pre-training models trained on monolingual data are used to initiate translation models to transfer knowledge from monolingual dat... ver más
Revista: Information

 
Rogelio Bautista-Sánchez, Liliana Ibeth Barbosa-Santillan and Juan Jaime Sánchez-Escobar    
The prediction of vessel maritime navigation has become an exciting topic in the last years, especially considering economics, commercial exchange, and security. In addition, vessel monitoring requires better systems and techniques that help enterprises ... ver más
Revista: Applied Sciences

 
Tessfu Geteye Fantaye, Junqing Yu and Tulu Tilahun Hailu    
Deep neural networks (DNNs) have shown a great achievement in acoustic modeling for speech recognition task. Of these networks, convolutional neural network (CNN) is an effective network for representing the local properties of the speech formants. Howev... ver más
Revista: Computers

 
Jaco Badenhorst and Febe de Wet    
When the National Centre for Human Language Technology (NCHLT) Speech corpus was released, it created various opportunities for speech technology development in the 11 official, but critically under-resourced, languages of South Africa. Since then, the s... ver más
Revista: Information

 
Sardar Parhat, Mijit Ablimit and Askar Hamdulla    
In this paper, based on the multilingual morphological analyzer, we researched the similar low-resource languages, Uyghur and Kazakh, short text classification. Generally, the online linguistic resources of these languages are noisy. So a preprocessing i... ver más
Revista: Information