|
|
|
Fei Ma, Yang Li, Shiguang Ni, Shao-Lun Huang and Lin Zhang
Audio-visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human-machine interactions. With the help of deep ...
ver más
|
|
|
|
|
|
Jinxiang Zeng, Du Zhang, Zhiyi Li and Xiaolin Li
Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time a...
ver más
|
|
|
|
|
|
Stefan Wagenpfeil, Paul Mc Kevitt and Matthias Hemmje
Multimedia feature graphs are employed to represent features of images, video, audio, or text. Various techniques exist to extract such features from multimedia objects. In this paper, we describe the extension of such a feature graph to represent the me...
ver más
|
|
|
|
|
|
Liz Huancapaza Hilasaca, Milton Cezar Ribeiro and Rosane Minghim
Labeling of samples is a recurrent and time-consuming task in data analysis and machine learning and yet generally overlooked in terms of visual analytics approaches to improve the process. As the number of tailored applications of learning models increa...
ver más
|
|
|
|
|
|
Jiyue Wang, Pei Zhang, Qianhua He, Yanxiong Li and Yongjian Hu
Label Smoothing Regularization (LSR) is a widely used tool to generalize classification models by replacing the one-hot ground truth with smoothed labels. Recent research on LSR has increasingly focused on the correlation between the LSR and Knowledge Di...
ver más
|
|
|