PhaseInv WSSD

Distribution of the Fourier coefficients of a synthetic RIR in the complex plane.

Abstract

In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time–frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

Audio Examples

Below are some audio examples comparing the dereverberation performance of various models. The models are trained under a weakly-supervised framework, with different configurations: either using or not using the reverberant phase in the loss function, and either using or not using logarithmic compression of the magnitude spectrograms. Regardless of the loss configuration, FSN estimates the complex STFT of the dry. PI-FSN is a variant of FSN that only estimates the magnitude spectrogram and uses the reverberant phase for reconstruction. You can listen to the wet input, the ground truth dry signal, and the outputs of the different models for three different reverberation times (RT60). Headphones are recommended for a better experience. Input wet signals come from the EARS-Reverb dataset.