AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec
This page is the demo of AudioDec [paper] [code]
Abstract
A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e. the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e. encoding and decoding the signal needs to be fast enough to enable communication without or with only minimal noticeable delay; and (3) reconstruction quality of the signal. In this work, we propose an open-source, streamable, and real-time neural audio codec that achieves strong performance along all three axes: it can reconstruct highly natural sounding 48 kHz speech signals while operating at only 12 kbps and running with less than 6 ms (GPU)/10 ms (CPU) latency. An efficient training paradigm is also demonstrated for developing such neural audio codecs for real-world scenarios.
Architecture
Demo Sounds
- VCTK (fs: 48 kHz) (codec bitrate: 12.8 kbps)
Codec | Female (p257_035) | Male (p232_400) |
---|---|---|
Natural | ||
SoundStream (reimplement) | ||
symAD | ||
symAD* | ||
asymAD | ||
AudioDec v0 | ||
AudioDec v1 | ||
AudioDec v2 |
symAD: symmetric AudioDec (autoencoder)
sumAD*: symAD w/o fixing the encoder during the adversarial training
asymAD: asymmetric AudioDec (asymmetric autoencoder)
AudioDec v0: AudioDec w/ HiFi-GAN vocoder
AudioDec v1: the proposed AudioDec (encoder + multi-group vocoder)
AudioDec v2: smaller AudioDec (encoder + smaller multi-group vocoder)
- LibriTTS (fs: 24 kHz) (codec bitrate: 6.4 kbps)
Codec | Female1 | Male1 |
---|---|---|
Natural | ||
symAD | ||
AudioDec v1 |
Codec | Female2 | Male2 |
---|---|---|
Natural | ||
symAD | ||
AudioDec v1 |
Speech Quality Measurments
Latency Analysis (ms)
- GPU: NVIDIA GeForce RTX3090
- CPU: AMD Ryzen Threadripper 3970X w/ 4 threads
Liability Disclaimer
The demo page utilizes public speech datasets (VCTK and LibriTTS) for demonstration purposes only, and we do not claim ownership over these speech samples. The Content of the demo files is provided "as is" and for general informational purposes only. We make no warranties regarding its accuracy or suitability. If you believe that any speech samples infringe upon your rights or violate any laws, please contact us to remove the demo files. We are not liable for any damages arising from the use or reliance on our demo page or open-source code. By accessing the demo page and using the open-source code, you agree to release us from any claims or liabilities related to its use.
page layout is modified from cayman-theme and cayman-blog. LICENSE