Redirigiendo al acceso original de articulo en 15 segundos...
Inicio  /  Computation  /  Vol: 5 Par: 2 (2017)  /  Artículo
ARTÍCULO
TITULO

Deep Visual Attributes vs. Hand-Crafted Audio Features on Multidomain Speech Emotion Recognition

Michalis Papakostas    
Evaggelos Spyrou    
Theodoros Giannakopoulos    
Giorgos Siantikos    
Dimitrios Sgouropoulos    
Phivos Mylonas and Fillia Makedon    

Resumen

Emotion recognition from speech may play a crucial role in many applications related to human?computer interaction or understanding the affective state of users in certain tasks, where other modalities such as video or physiological parameters are unavailable. In general, a human?s emotions may be recognized using several modalities such as analyzing facial expressions, speech, physiological parameters (e.g., electroencephalograms, electrocardiograms) etc. However, measuring of these modalities may be difficult, obtrusive or require expensive hardware. In that context, speech may be the best alternative modality in many practical applications. In this work we present an approach that uses a Convolutional Neural Network (CNN) functioning as a visual feature extractor and trained using raw speech information. In contrast to traditional machine learning approaches, CNNs are responsible for identifying the important features of the input thus, making the need of hand-crafted feature engineering optional in many tasks. In this paper no extra features are required other than the spectrogram representations and hand-crafted features were only extracted for validation purposes of our method. Moreover, it does not require any linguistic model and is not specific to any particular language. We compare the proposed approach using cross-language datasets and demonstrate that it is able to provide superior results vs. traditional ones that use hand-crafted features.

 Artículos similares

       
 
Hao Liu, Bo Yang and Zhiwen Yu    
Multimodal sarcasm detection is a developing research field in social Internet of Things, which is the foundation of artificial intelligence and human psychology research. Sarcastic comments issued on social media often imply people?s real attitudes towa... ver más
Revista: Applied Sciences

 
Qiuyue Li, Hao Sheng, Mingxue Sheng and Honglin Wan    
Efficient document recognition and sharing remain challenges in the healthcare, insurance, and finance sectors. One solution to this problem has been the use of deep learning techniques to automatically extract structured information from paper documents... ver más
Revista: Applied Sciences

 
Yuhuan Wu and Yonghong Wu    
Salient object detection (SOD) aims to identify the most visually striking objects in a scene, simulating the function of the biological visual attention system. The attention mechanism in deep learning is commonly used as an enhancement strategy which e... ver más
Revista: Algorithms

 
Noor Ul Ain Tahir, Zuping Zhang, Muhammad Asim, Junhong Chen and Mohammed ELAffendi    
Enhancing the environmental perception of autonomous vehicles (AVs) in intelligent transportation systems requires computer vision technology to be effective in detecting objects and obstacles, particularly in adverse weather conditions. Adverse weather ... ver más
Revista: Algorithms

 
May Alsaidi, Nadim Obeid, Nailah Al-Madi, Hazem Hiary and Ibrahim Aljarah    
Autism spectrum disorder (ASD) is a developmental disorder that encompasses difficulties in communication (both verbal and non-verbal), social skills, and repetitive behaviors. The diagnosis of autism spectrum disorder typically involves specialized proc... ver más
Revista: Information