Audio samples from Team Lab Phonetics 2021, TTS with Prosody Control

Thomas Bott and Sebastian Sammet, University of Stuttgart, Supervisor: Florian Lux

Baseline Model trained on ~19h of Audio from the Blizzard Challenge 2013

"Can I help you with your project?"

[No prosody control]-
-


Prosody Control Model trained on ~19h of Audio from the Blizzard Challenge 2013. The Model is conditioned on 7 Prosodic Parameters: Duration, Average Pitch, Minimum Pitch, Maximum Pitch, Average Energy, Minimum Energy, Maximum Energy

"Can I help you with your project?"

Prosodic Parameter -1.0 -0.8 -0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0
Duration
Pitch
Pitch Range (gets less with increasing value)
Energy
Energy Range (gets less with increasing value)