A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition

Xintong Wang and Chuangang Zhao

Resumen

Recent research shows recurrent neural network-Transducer (RNN-T) architecture has become a mainstream approach for streaming speech recognition. In this work, we investigate the VGG2 network as the input layer to the RNN-T in streaming speech recognition. Specifically, before the input feature is passed to the RNN-T, we introduce a gated-VGG2 block, which uses the first two layers of the VGG16 to extract contextual information in the time domain, and then use a SEnet-style gating mechanism to control what information in the channel domain is to be propagated to RNN-T. The results show that the RNN-T model with the proposed gated-VGG2 block brings significant performance improvement when compared to the existing RNN-T model, and it has a lower latency and character error rate than the Transformer-based model.

Palabras claves

streaming speech recognition - RNN-Transducer - gating mechanism - VGG

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 4 (2021)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences

DOI