Redirigiendo al acceso original de articulo en 20 segundos...
Inicio  /  Information  /  Vol: 12 Par: 4 (2021)  /  Artículo
ARTÍCULO
TITULO

A 2D Convolutional Gating Mechanism for Mandarin Streaming Speech Recognition

Xintong Wang and Chuangang Zhao    

Resumen

Recent research shows recurrent neural network-Transducer (RNN-T) architecture has become a mainstream approach for streaming speech recognition. In this work, we investigate the VGG2 network as the input layer to the RNN-T in streaming speech recognition. Specifically, before the input feature is passed to the RNN-T, we introduce a gated-VGG2 block, which uses the first two layers of the VGG16 to extract contextual information in the time domain, and then use a SEnet-style gating mechanism to control what information in the channel domain is to be propagated to RNN-T. The results show that the RNN-T model with the proposed gated-VGG2 block brings significant performance improvement when compared to the existing RNN-T model, and it has a lower latency and character error rate than the Transformer-based model.

 Artículos similares