ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling
This page is the demo of ComplexDec [paper]
Abstract
Neural audio codecs have been widely adopted in audio-generative tasks because their compact and discrete representations are suitable for both large-language-model-style and regression-based generative models. However, most neural codecs struggle to model out-of-domain audio, resulting in error propagations to downstream generative tasks. In this paper, we first argue that information loss from codec compression degrades out-of-domain robustness. Then, we propose full-band 48 kHz ComplexDec with complex spectral input and output to ease the information loss while adopting the same 24 kbps bitrate as the baseline AuidoDec and ScoreDec. Objective and subjective evaluations demonstrate the out-of-domain robustness of ComplexDec trained using only the 30-hour VCTK corpus.
Architecture

Demo Sounds
- Out-of-domain test corpus: EARS (fs: 48 kHz)
- All codec bitrates: 24 kbps
Codec | Amusing | Anger |
---|---|---|
Natural 48 kHz | ||
Natural 24 kHz | ||
AudioDec (in-domain) | ||
ScoreDec (in-domain) | ||
ComplexDec (in-domain) | ||
AudioDec (out-of-domain) | ||
ScoreDec (out-of-domain) | ||
ComplexDec (out-of-domain) | ||
Encodec 48 kHz | ||
Encodec 24 kHz | ||
DAC 24 kHz |
Codec | Reading | Loud |
---|---|---|
Natural 48 kHz | ||
Natural 24 kHz | ||
AudioDec (in-domain) | ||
ScoreDec (in-domain) | ||
ComplexDec (in-domain) | ||
AudioDec (out-of-domain) | ||
ScoreDec (out-of-domain) | ||
ComplexDec (out-of-domain) | ||
Encodec 48 kHz | ||
Encodec 24 kHz | ||
DAC 24 kHz |
Codec | Reading | Whisper |
---|---|---|
Natural 48 kHz | ||
Natural 24 kHz | ||
AudioDec (in-domain) | ||
ScoreDec (in-domain) | ||
ComplexDec (in-domain) | ||
AudioDec (out-of-domain) | ||
ScoreDec (out-of-domain) | ||
ComplexDec (out-of-domain) | ||
Encodec 48 kHz | ||
Encodec 24 kHz | ||
DAC 24 kHz |
Speech Quality Measurments
ComplexDec achieves similar in-domain and out-of-domain coding qualities while AudioDec and ScoreDec suffer significant degradation in coding the out-of-domain speech. ComplexDec also significantly outperforms the open-source Encodec models. The results indicate that the serious information loss cannot be fully compensated by the SPF or by solely increasing the training data. On the other hand, DAC also achieves impressive out-of-domain robustness because of its low compression ratio. However, the marked quality gap between ComplexDec and DAC shows the significant perceptual quality difference between 48 kHz and 24 kHz speech.

Out-of-domain Magnitude Spectral Comparison
We can find that AudioDec fails to reconstruct the harmonic structures and the blur spectrum results in hoarse speech. Although the SPF of ScoreDec can slightly recover the blurry spectrum because of the diffusion nature, the missing harmonics cannot be well recovered. However, ComplexDec well preserves the harmonic structures below 6~kHz because of the less information loss.

Liability Disclaimer
The demo page utilizes a public speech dataset (EARS) for demonstration purposes only. The Content of the demo files is provided "as is" and for general informational purposes only. We make no warranties regarding its accuracy or suitability. If you believe that any speech samples infringe upon your rights or violate any laws, please contact us to remove the demo files. We are not liable for any damages arising from the use or reliance on our demo page or open-source code. By accessing the demo page, you agree to release us from any claims or liabilities related to its use.
page layout is modified from cayman-theme and cayman-blog. LICENSE