Inicio  /  Acoustics  /  Vol: 5 Par: 3 (2023)  /  Artículo
ARTÍCULO
TITULO

On Training Targets and Activation Functions for Deep Representation Learning in Text-Dependent Speaker Verification

Achintya Kumar Sarkar and Zheng-Hua Tan    

Resumen

Deep representation learning has gained significant momentum in advancing text-dependent speaker verification (TD-SV) systems. When designing deep neural networks (DNN) for extracting bottleneck (BN) features, the key considerations include training targets, activation functions, and loss functions. In this paper, we systematically study the impact of these choices on the performance of TD-SV. For training targets, we consider speaker identity, time-contrastive learning (TCL), and auto-regressive prediction coding, with the first being supervised and the last two being self-supervised. Furthermore, we study a range of loss functions when speaker identity is used as the training target. With regard to activation functions, we study the widely used sigmoid function, rectified linear unit (ReLU), and Gaussian error linear unit (GELU). We experimentally show that GELU is able to reduce the error rates of TD-SV significantly compared to sigmoid, irrespective of the training target. Among the three training targets, TCL performs the best. Among the various loss functions, cross-entropy, joint-softmax, and focal loss functions outperform the others. Finally, the score-level fusion of different systems is also able to reduce the error rates. To evaluate the representation learning methods, experiments are conducted on the RedDots 2016 challenge database consisting of short utterances for TD-SV systems based on classic Gaussian mixture model-universal background model (GMM-UBM) and i-vector methods.

 Artículos similares

       
 
Quan Sun, Xuhui Pan, Xiao Ling, Bo Wang, Qinghong Sheng, Jun Li, Zhijun Yan, Ke Yu and Jiasong Wang    
In the realm of non-cooperative space security and on-orbit service, a significant challenge is accurately determining the pose of abandoned satellites using imaging sensors. Traditional methods for estimating the position of the target encounter problem... ver más
Revista: Aerospace

 
Juan Xu, Meng Ding, Zhen-Zhen Zhang, Yu-Bin Xu, Xu-Hui Wang and Fan Zhao    
The automatic collection of key milestone nodes in the process of aircraft turnaround plays an important role in the development needs of airport collaborative decision-making. This article exploits a computer vision-based framework to automatically reco... ver más
Revista: Applied Sciences

 
Suhare Solaiman, Emad Alsuwat and Rajwa Alharthi    
In this paper, a framework for simultaneous tracking and recognizing drone targets using a low-cost and small-sized millimeter-wave radar is presented. The radar collects the reflected signals of multiple targets in the field of view, including drone and... ver más

 
Zhuo Wang, Haojie Chen, Hongde Qin and Qin Chen    
In the computer vision field, underwater object detection has been a challenging task. Due to the attenuation of light in a medium and the scattering of light by suspended particles in water, underwater optical images often face the problems of color dis... ver más

 
Jiqing Du, Dan Zhou, Wei Wang and Sachiyo Arai    
The Deep Reinforcement Learning (DRL) algorithm is an optimal control method with generalization capacity for complex nonlinear coupled systems. However, the DRL agent maintains control command saturation and response overshoot to achieve the fastest res... ver más