This page is the demo of

  1. “Non-parallel voice conversion system with WaveNet vocoder and collapsed speech suppression” [paper]
  2. “Collapsed speech segment detection and suppression for WaveNet vocoder” [paper] [code]
  3. “The NU non-parallel voice conversion system for the voice conversion challenge 2018” [paper]

Abstract

We integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the loss of quality to short periods.

Testing corpus: VCC2018

Collapsed speech problem

Figure (a): WN-generated waveforms w/ collapsed speech.
Figure (b): WN-generated waveforms w/ LPCDC and CSSD.

Collapsed speech segment detection (CSSD)

Figure (a): WN-generated waveforms w/ collapsed speech.
Figure (b): WORLD-generated waveforms (reference).
Figure (c): Extracted waveform envelopes.
Figure (d): Difference in waveform envelope.

LPC distribution constraint (LPCDC)

WN vocoder with CSSD and LPCDC

Speaker voice conversion (Non-parallel)

Vocoder Female (SF4->TF1) Male (SM3->TM1)
Source
Target
Target + WN
  Collapsed-free Collapsed-free
DNN + WN
DMDN + WN
DMDN + WORLD
DMDN + LPCDC
  Collapsed Collapsed
DNN + WN
DMDN + WN
DMDN + WORLD
DMDN + LPCDC
DMDN + LPCDC + CSSD


Vocoder Female (SF3->TM2) Male (SM4->TF2)
Source
Target
Target + WN
  Collapsed-free Collapsed-free
DNN + WN
DMDN + WN
DMDN + WORLD
DMDN + LPCDC
  Collapsed Collapsed
DNN + WN
DMDN + WN
DMDN + WORLD
DMDN + LPCDC
DMDN + LPCDC + CSSD




Home




page layout is modified from cayman-theme and cayman-blog. LICENSE