Is Phase Really Needed for Weakly-Supervised Dereverberation ?

Marius Rodrigues, Louis Bahrman, Roland Badeau, Gaël Richard

LTCI, Télécom Paris, Institut Polytechnique de Paris, France

Submitted to ICASSP 2026



Distribution of the Fourier coefficients of a synthetic RIR in the complex plane.

Distribution of synthetic Room Impulse Response's Fourier coefficients in the complex plane, at several frequencies.

Abstract

In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time–frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.

Audio Examples

Below are some audio examples comparing the dereverberation performance of various models. The models are trained under a weakly-supervised framework, with different configurations: either using or not using the reverberant phase in the loss function, and either using or not using logarithmic compression of the magnitude spectrograms. Regardless of the loss configuration, FSN estimates the complex STFT of the dry. PI-FSN is a variant of FSN that only estimates the magnitude spectrogram and uses the reverberant phase for reconstruction. You can listen to the wet input, the ground truth dry signal, and the outputs of the different models for three different reverberation times (RT60). Headphones are recommended for a better experience. Input wet signals come from the EARS-Reverb dataset.

Wet input Ground truth FSN FSN FSN FSN PI-FSN
Phase-inv. ?
log-comp. ?
RT60=0.32s
RT60=0.54s
RT60=0.90s

Citation

If you find this work useful, please cite it using the following BibTeX entry:

                    Coming soon