Redirigiendo al acceso original de articulo en 15 segundos...
ARTÍCULO
TITULO

Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis

Matthieu Saumard    

Resumen

Speech Emotions Recognition (SER) has gained significant attention in the fields of human?computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human?computer interaction and enabling more sophisticated emotion-aware applications.

 Artículos similares