ScoreDec: A Phase-preserving High-Fidelity Audio Codec with A Generalized Score-based Diffusion Post-filter

Yi-Chiao Wu, Dejan Marković, Steven Krenn, Israel D. Gebru, and Alexander Richard Meta Reality Labs Research, USA

This page is the demo of ScoreDec [paper]

Abstract

Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant. A generative adversarial network (GAN) training is usually required for these E2E neural codecs because of the difficulty of direct phase modeling. However, such adversarial learning hinders these codecs from preserving the original phase information. To achieve human-level naturalness with a reasonable bitrate, preserve the original phase, and get rid of the tricky and opaque GAN training, we develop a score-based diffusion post-filter (SPF) in the complex spectral domain and combine our previous AudioDec with the SPF to propose ScoreDec, which can be trained using only spectral and score-matching losses. Both the objective and subjective experimental results show that ScoreDec with a 24 kbps bitrate encodes and decodes full-band 48 kHz speech with human-level naturalness and well-preserved phase information.

Architecture

Demo Sounds

VCTK (fs: 48 kHz) (codec bitrate: 24 kbps)

Codec	Male (p232_005)	Female (p257_016)
Natural
symAD
AudioDec
ScoreDec
Opus
Opus_SPF

Codec	Male (p232_007)	Female (p257_080)
Natural
symAD
AudioDec
ScoreDec
Opus
Opus_SPF

^{symAD: symmetric AudioDec}
^{AudioDec: AudioDec (w/ multi-group HiFi-GAN vocoder)}
^{ScoreDec: the proposed ScoreDec}
^{Opus: Opus w/ 24kbps for mono audio}
^{Opus_SPF: the proposed score-based post-filter combined with Opus}

Speech Quality Measurments

Waveform Similarity Comparison

Liability Disclaimer

The demo page utilizes a public speech dataset (VCTK) for demonstration purposes only, and we do not claim ownership over these speech samples. The Content of the demo files is provided "as is" and for general informational purposes only. We make no warranties regarding its accuracy or suitability. If you believe that any speech samples infringe upon your rights or violate any laws, please contact us to remove the demo files. We are not liable for any damages arising from the use or reliance on our demo page or open-source code. By accessing the demo page, you agree to release us from any claims or liabilities related to its use.

Home

page layout is modified from cayman-theme and cayman-blog. LICENSE