Marius Rodrigues, Louis Bahrman, Roland Badeau, Gaël Richard
LTCI, Télécom Paris, Institut Polytechnique de Paris, France
Submitted to ICASSP 2026
Distribution of synthetic Room Impulse Response's Fourier coefficients in the complex plane, at several frequencies.
In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what extent information can be retrieved from the sole knowledge of reverberant (wet) speech becomes critical. This work investigates the role of the reverberant (wet) phase in the time–frequency domain. Based on Statistical Wave Field Theory, we show that late reverberation perturbs phase components with white, uniformly distributed noise, except at low frequencies. Consequently, the wet phase carries limited useful information and is not essential for weakly supervised dereverberation. To validate this finding, we train dereverberation models under a recent weak supervision framework and demonstrate that performance can be significantly improved by excluding the reverberant phase from the loss function.
Below are some audio examples comparing the dereverberation performance of various models. The models are trained under a weakly-supervised framework, with different configurations: either using or not using the reverberant phase in the loss function, and either using or not using logarithmic compression of the magnitude spectrograms. Regardless of the loss configuration, FSN estimates the complex STFT of the dry. PI-FSN is a variant of FSN that only estimates the magnitude spectrogram and uses the reverberant phase for reconstruction. You can listen to the wet input, the ground truth dry signal, and the outputs of the different models for three different reverberation times (RT60). Headphones are recommended for a better experience. Input wet signals come from the EARS-Reverb dataset.
Wet input | Ground truth | FSN | FSN | FSN | FSN | PI-FSN | |
Phase-inv. ? | ✗ | ✗ | ✓ | ✓ | ✓ | ||
log-comp. ? | ✗ | ✓ | ✗ | ✓ | ✓ | ||
RT60=0.32s | |||||||
RT60=0.54s | |||||||
RT60=0.90s |
If you find this work useful, please cite it using the following BibTeX entry:
Coming soon