Inicio  /  Applied Sciences  /  Vol: 12 Par: 5 (2022)  /  Artículo
ARTÍCULO
TITULO

Multi-Scale Features for Transformer Model to Improve the Performance of Sound Event Detection

Soo-Jong Kim and Yong-Joo Chung    

Resumen

To alleviate the problem of performance degradation due to the varied sound durations of competing classes in sound event detection, we propose a method that utilizes multi-scale features for sound event detection. We employed a feature-pyramid component in a deep neural network architecture based on the Transformer encoder that is used to efficiently model the time correlation of sound signals because of its superiority over conventional recurrent neural networks, as demonstrated in recent studies. We used layers of convolutional neural networks to produce two-dimensional acoustic features that are input into the Transformer encoders. The outputs of the Transformer encoders at different levels of the network are combined to obtain the multi-scale features to feed the fully connected feed-forward neural network, which acts as the final classification layer. The proposed method is motivated by the idea that multi-scale features make the network more robust against the dynamic duration of the sound signals depending on their classes. We also applied the proposed method to a mean-teacher model, based on the Transformer encoder, to demonstrate its effectiveness on a large set of unlabeled data. We conducted experiments using the DCASE 2019 Task 4 dataset to evaluate the performance of the proposed method. The experimental results show that the proposed architecture outperforms the baseline network without multi-scale features.

 Artículos similares

       
 
Qiyan Li, Zhi Weng, Zhiqiang Zheng and Lixin Wang    
The decrease in lake area has garnered significant attention within the global ecological community, prompting extensive research in remote sensing and computer vision to accurately segment lake areas from satellite images. However, existing image segmen... ver más
Revista: Applied Sciences

 
Ruoyang Li, Shuping Xiong, Yinchao Che, Lei Shi, Xinming Ma and Lei Xi    
Semantic segmentation algorithms leveraging deep convolutional neural networks often encounter challenges due to their extensive parameters, high computational complexity, and slow execution. To address these issues, we introduce a semantic segmentation ... ver más
Revista: Algorithms

 
Jiao Su, Yi An, Jialin Wu and Kai Zhang    
Pedestrian detection has always been a difficult and hot spot in computer vision research. At the same time, pedestrian detection technology plays an important role in many applications, such as intelligent transportation and security monitoring. In comp... ver más
Revista: Algorithms

 
Changhong Liu, Jiawen Wen, Jinshan Huang, Weiren Lin, Bochun Wu, Ning Xie and Tao Zou    
Underwater object detection is crucial in marine exploration, presenting a challenging problem in computer vision due to factors like light attenuation, scattering, and background interference. Existing underwater object detection models face challenges ... ver más

 
Yiming Mo, Lei Wang, Wenqing Hong, Congzhen Chu, Peigen Li and Haiting Xia    
The intrusion of foreign objects on airport runways during aircraft takeoff and landing poses a significant safety threat to air transportation. Small-scale Foreign Object Debris (FOD) cannot be ruled out on time by traditional manual inspection, and the... ver más
Revista: Applied Sciences