End-to-End Mispronunciation Detection and Diagnosis Using Transfer Learning

Linkai Peng

Yingming Gao

Rian Bao

Ya Li and Jinsong Zhang

Resumen

As an indispensable module of computer-aided pronunciation training (CAPT) systems, mispronunciation detection and diagnosis (MDD) techniques have attracted a lot of attention from academia and industry over the past decade. To train robust MDD models, this technique requires massive human-annotated speech recordings which are usually expensive and even hard to acquire. In this study, we propose to use transfer learning to tackle the problem of data scarcity from two aspects. First, from audio modality, we explore the use of the pretrained model wav2vec2.0 for MDD tasks by learning robust general acoustic representation. Second, from text modality, we explore transferring prior texts into MDD by learning associations between acoustic and textual modalities. We propose textual modulation gates that assign more importance to the relevant text information while suppressing irrelevant text information. Moreover, given the transcriptions, we propose an extra contrastive loss to reduce the difference of learning objectives between the phoneme recognition and MDD tasks. Conducting experiments on the L2-Arctic dataset showed that our wav2vec2.0 based models outperformed the conventional methods. The proposed textual modulation gate and contrastive loss further improved the F1-score by more than 2.88% and our best model achieved an F1-score of 61.75%.

Palabras claves

mispronunciation detection and diagnosis (MDD) - computer-aided pronunciation training (CAPT) - transfer learning - pretrained model - text modulation gate

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 13 Parte: 11 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Algorithms
Infrastructures

DOI

https://doi.org/10.3390/app13116793

Artículos similares

Defining Runoff Indices and Analyzing Their Relationships with Associated Precipitation and Temperature Indices for Upper River Basins in the Northwest Arid Region of China

Acceso

Shaoping Wang, Yongjian Ding and Mudassar Iqbal

The northwest arid region (NAR) of China, located in a cold region, has been experiencing extreme weather and runoff events for years. Summer (from June to August) is the main season for forming runoff in this region. Summer runoff is contributed by glac... ver más

Revista: Water

The Preliminary Study on The Effect of Coarse Particles Content on OMC and Maximum Dry Unit Weight: A Case of Aceh?s Fill Materials

Acceso

Bambang Setiawan Pág. 75 - 81

Empirical evidence suggests that the percentage of coarse fraction content on soil has an influence on the soil optimum moisture content (OMC) and soil maximum dry density (MDD). This phenomenon is used as a basis to examine the characteristics of Aceh?s... ver más

Revista: Aceh International Journal of Science and Technology

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas