REVISTA
Information

TODAS

Redirigiendo al acceso original de articulo en 18 segundos...

Inicio / Information / Vol: 13 Par: 10 (2022) / Artículo

ARTÍCULO

TITULO

Language Identification-Based Evaluation of Single Channel Speech Separation of Overlapped Speeches

Zuhragvl Aysa

Mijit Ablimit

Hankiz Yilahun and Askar Hamdulla

Resumen

In multi-lingual, multi-speaker environments (e.g., international conference scenarios), speech, language, and background sounds can overlap. In real-world scenarios, source separation techniques are needed to separate target sounds. Downstream tasks, such as ASR, speaker recognition, speech recognition, VAD, etc., can be combined with speech separation tasks to gain a better understanding. Since most of the evaluation methods for monophonic separation are either single or subjective, this paper used the downstream recognition task as an overall evaluation criterion. Thus, the performance could be directly evaluated by the metrics of the downstream task. In this paper, we investigated a two-stage training scheme that combined speech separation and language identification tasks. To analyze and optimize the separation performance of single-channel overlapping speech, the separated speech was fed to a language identification engine to evaluate its accuracy. The speech separation model was a single-channel speech separation network trained with WSJ0-2mix. For the language identification system, we used an Oriental Language Dataset and a dataset synthesized by directly mixing different proportions of speech groups. The combined effect of these two models was evaluated for various overlapping speech scenarios. When the language identification network model was based on single-person single-speech frequency spectrum features, Chinese, Japanese, Korean, Indonesian, and Vietnamese had significantly improved recognition results over the mixed audio spectrum.

Palabras claves

speech separation - Conv-TasNet - language identification - overlap rate - spectrogram

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 13 Parte: 10 (2022)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

Applied Sciences
Information
Acoustics

DOI

https://doi.org/10.3390/info13100492

Artículos similares

Adaptive Speech Separation Based on Beamforming and Frequency Domain-Independent Component Analysis

Acceso

Ke Zhang, Yangjie Wei, Dan Wu and Yi Wang

Voice signals acquired by a microphone array often include considerable noise and mutual interference, seriously degrading the accuracy and speed of speech separation. Traditional beamforming is simple to implement, but its source interference suppressio... ver más

Revista: Applied Sciences

An Expectation?Maximization-Based IVA Algorithm for Speech Source Separation Using Student?s t Mixture Model Based Source Priors

Acceso

Waqas Rafique, Jonathon Chambers and Ali Imam Sunny

The performance of the independent vector analysis (IVA) algorithm depends on the choice of the source prior to better model the speech signals as it employs a multivariate source prior to retain the dependency between frequency bins of each source. Iden... ver más

Revista: Acoustics

Multiple Speech Source Separation Using Inter-Channel Correlation and Relaxed Sparsity

Acceso

Maoshen Jia, Jundai Sun and Xiguang Zheng

In this work, a multiple speech source separation method using inter-channel correlation and relaxed sparsity is proposed. A B-format microphone with four spatially located channels is adopted due to the size of the microphone array to preserve the spati... ver más

Revista: Applied Sciences

Source Separation via Spectral Masking for Speech Recognition Systems

Acceso

Gustavo Fernandes Rodrigues,Thiago de Souza Siqueira,Ana Cláudia Silva de Souza,Hani Camille Yehia Pág. 80 - 85

In this paper we present an insight into the use of spectral masking techniques in time-frequency domain, as a preprocessing step for the speech signal recognition. Speech recognition systems have their performance negatively affected in noisy environmen... ver más

Revista: International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems

Two-Microphone Separation of Speech Mixtures

Acceso

Pedersen, M. S.; Wang, D.; Larsen, J.; Kjems, U. Pág. 475 - 492

Revista: IEEE TRANSACTIONS ON NEURAL NETWORK

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas