Resumen
This paper presents an echo suppression system that combines a linear acoustic echo canceller (AEC) with a deep complex convolutional recurrent network (DCCRN) for residual echo suppression. The filter taps of the AEC are adjusted in subbands by using the normalized sign-error least mean squares (NSLMS) algorithm. The NSLMS is compared with the commonly-used normalized least mean squares (NLMS), and the combination of each with the proposed deep residual echo suppression model is studied. The utilization of a pre-trained deep-learning speech denoising model as an alternative to a residual echo suppressor (RES) is also studied. The results showed that the performance of the NSLMS is superior to that of the NLMS in all settings. With the NSLMS output, the proposed RES achieved better performance than the larger pre-trained speech denoiser model. More notably, the denoiser performed considerably better on the NSLMS output than on the NLMS output, and the performance gap was greater than the respective gap when employing the RES, indicating that the residual echo in the NSLMS output was more akin to noise than speech. Therefore, when little data is available to train an RES, a pre-trained speech denoiser is a viable alternative when employing the NSLMS for the preceding linear AEC.