Research on Speech Emotion Recognition Method Based A-CapsNet

Yingmei Qi

Heming Huang and Huiyun Zhang

Resumen

Speech emotion recognition is a crucial work direction in speech recognition. To increase the performance of speech emotion detection, researchers have worked relentlessly to improve data augmentation, feature extraction, and pattern formation. To address the concerns of limited speech data resources and model training overfitting, A-CapsNet, a neural network model based on data augmentation methodologies, is proposed in this research. In order to solve the issue of data scarcity and achieve the goal of data augmentation, the noise from the Noisex-92 database is first combined with four different data division methods (emotion-independent random-division, emotion-dependent random-division, emotion-independent cross-validation and emotion-dependent cross-validation methods, abbreviated as EIRD, EDRD, EICV and EDCV, respectively). The database EMODB is then used to analyze and compare the performance of the model proposed in this paper under different signal-to-noise ratios, and the results show that the proposed model and data augmentation are effective.

Palabras claves

speech emotion recognition - data augmentation - data division - network model

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 24 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

DOI

https://doi.org/10.3390/app122412983

Research on Speech Emotion Recognition Method Based A-CapsNet

Revistas destacadas