Redirigiendo al acceso original de articulo en 22 segundos...
Inicio  /  Algorithms  /  Vol: 16 Par: 7 (2023)  /  Artículo
ARTÍCULO
TITULO

Audio Anti-Spoofing Based on Audio Feature Fusion

Jiachen Zhang    
Guoqing Tu    
Shubo Liu and Zhaohui Cai    

Resumen

The rapid development of speech synthesis technology has significantly improved the naturalness and human-likeness of synthetic speech. As the technical barriers for speech synthesis are rapidly lowering, the number of illegal activities such as fraud and extortion is increasing, posing a significant threat to authentication systems, such as automatic speaker verification. This paper proposes an end-to-end speech synthesis detection model based on audio feature fusion in response to the constantly evolving synthesis techniques and to improve the accuracy of detecting synthetic speech. The model uses a pre-trained wav2vec2 model to extract features from raw waveforms and utilizes an audio feature fusion module for back-end classification. The audio feature fusion module aims to improve the model accuracy by adequately utilizing the audio features extracted from the front end and fusing the information from timeframes and feature dimensions. Data augmentation techniques are also used to enhance the performance generalization of the model. The model is trained on the training and development sets of the logical access (LA) dataset of the ASVspoof 2019 Challenge, an international standard, and is tested on the logical access (LA) and deep-fake (DF) evaluation datasets of the ASVspoof 2021 Challenge. The equal error rate (EER) on ASVspoof 2021 LA and ASVspoof 2021 DF are 1.18% and 2.62%, respectively, achieving the best results on the DF dataset.

 Artículos similares

       
 
Jih-Ching Chiu, Guan-Yi Lee, Chih-Yang Hsieh and Qing-You Lin    
In computer vision and image processing, the shift from traditional cameras to emerging sensing tools, such as gesture recognition and object detection, addresses privacy concerns. This study navigates the Integrated Sensing and Communication (ISAC) era,... ver más

 
Maryam Omar, Hafeez Ur Rehman, Omar Bin Samin, Moutaz Alazab, Gianfranco Politano and Alfredo Benso    
Text-to-image synthesis is one of the most critical and challenging problems of generative modeling. It is of substantial importance in the area of automatic learning, especially for image creation, modification, analysis and optimization. A number of wo... ver más
Revista: Information

 
Jialin Zhang, Mairidan Wushouer, Gulanbaier Tuerhong and Hanfang Wang    
Emotional speech synthesis is an important branch of human?computer interaction technology that aims to generate emotionally expressive and comprehensible speech based on the input text. With the rapid development of speech synthesis technology based on ... ver más
Revista: Applied Sciences

 
Qianmu Xiao and Liang Zhao    
Acquiring relevant, high-quality, and heterogeneous medical images is essential in various types of automated analysis, used for a variety of downstream data augmentation tasks. However, a large number of real image samples are expensive to obtain, espec... ver más
Revista: Applied Sciences

 
Sergey Sakulin and Alexander Alfimtsev    
The modern tourist industry is characterized by an abundance of applied multicriteria decision-making tasks. Several researchers have demonstrated that such tasks can be effectively resolved using aggregation operators based on fuzzy integrals and fuzzy ... ver más