REVISTA
Applied Sciences

TODAS

Redirigiendo al acceso original de articulo en 17 segundos...

Inicio / Applied Sciences / Vol: 12 Par: 1 (2022) / Artículo

ARTÍCULO

TITULO

Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN

Fei Ma

Yang Li

Shiguang Ni

Shao-Lun Huang and Lin Zhang

Resumen

Audio-visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human-machine interactions. With the help of deep learning, previous works have made great progress for audio-visual emotion recognition. However, these deep learning methods often require a large amount of data for training. In reality, data acquisition is difficult and expensive, especially for the multimodal data with different modalities. As a result, the training data may be in the low-data regime, which cannot be effectively used for deep learning. In addition, class imbalance may occur in the emotional data, which can further degrade the performance of audio-visual emotion recognition. To address these problems, we propose an efficient data augmentation framework by designing a multimodal conditional generative adversarial network (GAN) for audio-visual emotion recognition. Specifically, we design generators and discriminators for audio and visual modalities. The category information is used as their shared input to make sure our GAN can generate fake data of different categories. In addition, the high dependence between the audio modality and the visual modality in the generated multimodal data is modeled based on Hirschfeld-Gebelein-Rényi (HGR) maximal correlation. In this way, we relate different modalities in the generated data to approximate the real data. Then, the generated data are used to augment our data manifold. We further apply our approach to deal with the problem of class imbalance. To the best of our knowledge, this is the first work to propose a data augmentation strategy with a multimodal conditional GAN for audio-visual emotion recognition. We conduct a series of experiments on three public multimodal datasets, including eNTERFACE?05, RAVDESS, and CMEW. The results indicate that our multimodal conditional GAN has high effectiveness for data augmentation of audio-visual emotion recognition.

Palabras claves

audio-visual emotion recognition - data augmentation - multimodal conditional generative adversarial network (GAN) - Hirschfeld-Gebelein-Rényi (HGR) maximal correlation

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 12 Parte: 1 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Information
Water
Applied Sciences

DOI

https://doi.org/10.3390/app12010527

Artículos similares

Adversarial Attacks on Medical Segmentation Model via Transformation of Feature Statistics

Acceso

Woonghee Lee, Mingeon Ju, Yura Sim, Young Kul Jung, Tae Hyung Kim and Younghoon Kim

Deep learning-based segmentation models have made a profound impact on medical procedures, with U-Net based computed tomography (CT) segmentation models exhibiting remarkable performance. Yet, even with these advances, these models are found to be vulner... ver más

Revista: Applied Sciences

Microseismic Velocity Inversion Based on Deep Learning and Data Augmentation

Acceso

Lei Li, Xiaobao Zeng, Xinpeng Pan, Ling Peng, Yuyang Tan and Jianxin Liu

Microseismic monitoring plays an essential role for reservoir characterization and earthquake disaster monitoring and early warning. The accuracy of the subsurface velocity model directly affects the precision of event localization and subsequent process... ver más

Revista: Applied Sciences

Effect of Data Augmentation on Deep-Learning-Based Segmentation of Long-Axis Cine-MRI

Acceso

François Legrand, Richard Macwan, Alain Lalande, Lisa Métairie and Thomas Decourselle

Automated Cardiac Magnetic Resonance segmentation serves as a crucial tool for the evaluation of cardiac function, facilitating faster clinical assessments that prove advantageous for both practitioners and patients alike. Recent studies have predominant... ver más

Revista: Algorithms

Exploring the Efficacy of Base Data Augmentation Methods in Deep Learning-Based Radiograph Classification of Knee Joint Osteoarthritis

Acceso

Fabi Prezja, Leevi Annala, Sampsa Kiiskinen and Timo Ojala

Diagnosing knee joint osteoarthritis (KOA), a major cause of disability worldwide, is challenging due to subtle radiographic indicators and the varied progression of the disease. Using deep learning for KOA diagnosis requires broad, comprehensive dataset... ver más

Revista: Algorithms

Unraveling a Histopathological Needle-in-Haystack Problem: Exploring the Challenges of Detecting Tumor Budding in Colorectal Carcinoma Histology

Acceso

Daniel Rusche, Nils Englert, Marlen Runz, Svetlana Hetjens, Cord Langner, Timo Gaiser and Cleo-Aron Weis

Background: In this study focusing on colorectal carcinoma (CRC), we address the imperative task of predicting post-surgery treatment needs by identifying crucial tumor features within whole slide images of solid tumors, analogous to locating a needle in... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas