Chinese?Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Yonghua Wen

Junjun Guo

Zhiqiang Yu and Zhengtao Yu

Resumen

Parallel sentences play a crucial role in various NLP tasks, particularly for cross-lingual tasks such as machine translation. However, due to the time-consuming and laborious nature of manual construction, many low-resource languages still suffer from a lack of large-scale parallel data. The objective of pseudo-parallel sentence extraction is to automatically identify sentence pairs in different languages that convey similar meanings. Earlier methods heavily relied on parallel data, which is unsuitable for low-resource scenarios. The current mainstream research direction is to use transfer learning or unsupervised learning based on cross-lingual word embeddings and multilingual pre-trained models; however, these methods are ineffective for languages with substantial differences. To address this issue, we propose a sentence extraction method that leverages image information fusion to extract Chinese?Vietnamese pseudo-parallel sentences from collections of bilingual texts. Our method first employs an adaptive image and text feature fusion strategy to efficiently extract the bilingual parallel sentence pair, and then, a multimodal fusion method is presented to balance the information between the image and text modalities. The experiments on multiple benchmarks show that our method achieves promising results compared to a competitive baseline by infusing additional external image information.

Palabras claves

neural machine translation - pseudo-parallel sentence extraction - image information fusion

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 14 Parte: 5 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/info14050298

Chinese?Vietnamese Pseudo-Parallel Sentences Extraction Based on Image Information Fusion

Revistas destacadas