Code

Audio Samples

Baseline - TTS system without additional conditioning prompt

The baseline TTS system generates speech directly from the given text input, without any additional conditioning prompts. This approach serves as a foundation for comparison.

Proposed - TTS system additionally conditioned on natural language prompts

The proposed TTS system takes advantage of natural language prompts for enhanced prosody control. By conditioning the generation process on these prompts, the goal is to achieve more expressive and contextually appropriate speech output. Thereby the produced speech prosody is expected to rely on the (emotional) content of the prompt.

Using the Input Text as Prompt

Emotion	Input Sentence	Baseline	Proposed
Anger	You can't be serious, how dare you not tell me you were going to marry her?
Joy	I really enjoy the beach in the summer.
Neutral	You can go to the Employment Development Office and pick it up.
Sadness	Lily broke up with me last week, in fact, she dumped me.
Surprise	He was astonished when he saw them come alone, and asked what had happened to them.

Using a different Prompt

Emotion	Prompt	Input Sentence
Anger	You can't be serious, how dare you not tell me you were going to marry her?	Lily broke up with me last week, in fact, she dumped me.
Joy	I really enjoy the beach in the summer.	You can go to the Employment Development Office and pick it up.
Neutral	You can go to the Employment Development Office and pick it up.	You can't be serious, how dare you not tell me you were going to marry her?
Sadness	Lily broke up with me last week, in fact, she dumped me.	He was astonished when he saw them come alone, and asked what had happened to them.
Surprise	He was astonished when he saw them come alone, and asked what had happened to them.	I really enjoy the beach in the summer.