AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec

Yi-Chiao Wu, Israel D. Gebru, Dejan Marković, and Alexander Richard Meta Reality Labs Research, USA

This page is the demo of AudioDec [paper] [code]

Abstract

A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e. the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e. encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal. In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency. An efficient training paradigm is also demonstrated for developing such neural audio codecs for real-world scenarios.

Architecture

Demo Sounds

VCTK (fs: 48 kHz) (codec bitrate: 12.8 kbps)

Codec	Female (p257_035)	Male (p232_400)
Natural
SoundStream (reimplement)
symAD
symAD*
asymAD
AudioDec v0
AudioDec v1
AudioDec v2

^{symAD: symmetric AudioDec (autoencoder)}
^{sumAD*: symAD w/o fixing the encoder during the adversarial training}
^{asymAD: asymmetric AudioDec (asymmetric autoencoder)}
^{AudioDec v0: AudioDec w/ HiFi-GAN vocoder}
^{AudioDec v1: the proposed AudioDec (encoder + multi-group vocoder)}
^{AudioDec v2: smaller AudioDec (encoder + smaller multi-group vocoder)}

LibriTTS (fs: 24 kHz) (codec bitrate: 6.4 kbps)

Codec	Female1	Male1
Natural
symAD
AudioDec v1

Codec	Female2	Male2
Natural
symAD
AudioDec v1

Speech Quality Measurments

Latency Analysis (ms)

GPU: NVIDIA GeForce RTX3090

CPU: AMD Ryzen Threadripper 3970X w/ 4 threads

Liability Disclaimer

The demo page utilizes public speech datasets (VCTK and LibriTTS) for demonstration purposes only, and we do not claim ownership over these speech samples. The Content of the demo files is provided "as is" and for general informational purposes only. We make no warranties regarding its accuracy or suitability. If you believe that any speech samples infringe upon your rights or violate any laws, please contact us to remove the demo files. We are not liable for any damages arising from the use or reliance on our demo page or open-source code. By accessing the demo page and using the open-source code, you agree to release us from any claims or liabilities related to its use.

Home

page layout is modified from cayman-theme and cayman-blog. LICENSE