Inicio  /  Applied Sciences  /  Vol: 12 Par: 24 (2022)  /  Artículo
ARTÍCULO
TITULO

Research on Speech Emotion Recognition Method Based A-CapsNet

Yingmei Qi    
Heming Huang and Huiyun Zhang    

Resumen

Speech emotion recognition is a crucial work direction in speech recognition. To increase the performance of speech emotion detection, researchers have worked relentlessly to improve data augmentation, feature extraction, and pattern formation. To address the concerns of limited speech data resources and model training overfitting, A-CapsNet, a neural network model based on data augmentation methodologies, is proposed in this research. In order to solve the issue of data scarcity and achieve the goal of data augmentation, the noise from the Noisex-92 database is first combined with four different data division methods (emotion-independent random-division, emotion-dependent random-division, emotion-independent cross-validation and emotion-dependent cross-validation methods, abbreviated as EIRD, EDRD, EICV and EDCV, respectively). The database EMODB is then used to analyze and compare the performance of the model proposed in this paper under different signal-to-noise ratios, and the results show that the proposed model and data augmentation are effective.

 Artículos similares

       
 
Jialin Zhang, Mairidan Wushouer, Gulanbaier Tuerhong and Hanfang Wang    
Emotional speech synthesis is an important branch of human?computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on ... ver más
Revista: Applied Sciences

 
Tina Gabrovec, Jana Dragar, Domen Guzelj, Petra Povalej Br?an and Janez Rebol    
This research aims to determine whether a neural response telemetry (NRT) threshold determines the success of surgery. Furthermore, we examined whether the patient?s age, the etiology of their hearing loss, the depth of the electrode insertion, and a slo... ver más
Revista: Applied Sciences

 
Ji-Yeoun Lee, Ji-Hye Park, Ji-Na Lee and Ah-Ra Jung    
Examining the relationship between the prognostic factors and the effectiveness of voice therapy is a crucial step in developing personalized treatment strategies for individuals with voice disorders. This study recommends using the multilayer perceptron... ver más
Revista: Applied Sciences

 
Taiki Arakane and Takeshi Saitoh    
This paper studies various deep learning models for word-level lip-reading technology, one of the tasks in the supervised learning of video classification. Several public datasets have been published in the lip-reading research field. However, few studie... ver más
Revista: Algorithms

 
Shuo Chen, Hartmut Helmke, Robert M. Tarakan, Oliver Ohneiser, Hunter Kopald and Matthias Kleinert    
As researchers around the globe develop applications for the use of Automatic Speech Recognition and Understanding (ASRU) in the Air Traffic Management (ATM) domain, Air Traffic Control (ATC) language ontologies will play a critical role in enabling rese... ver más
Revista: Aerospace