Code
https://github.com/Thommy96/IMS-Toucan
Audio Samples
Baseline - TTS system without additional conditioning prompt
The baseline TTS system generates speech directly from the given text input, without any additional conditioning prompts. This approach serves as a foundation for comparison.
Proposed - TTS system additionally conditioned on natural language prompts
The proposed TTS system takes advantage of natural language prompts for enhanced prosody control. By conditioning the generation process on these prompts, the goal is to achieve more expressive and contextually appropriate speech output. Thereby the produced speech prosody is expected to rely on the (emotional) content of the prompt.
Using the Input Text as Prompt
Emotion | Input Sentence | Baseline | Proposed |
---|---|---|---|
Anger | You can't be serious, how dare you not tell me you were going to marry her? | ||
Joy | I really enjoy the beach in the summer. | ||
Neutral | You can go to the Employment Development Office and pick it up. | ||
Sadness | Lily broke up with me last week, in fact, she dumped me. | ||
Surprise | He was astonished when he saw them come alone, and asked what had happened to them. |
Using a different Prompt
Emotion | Prompt | Input Sentence | Proposed | |
---|---|---|---|---|
Anger | You can't be serious, how dare you not tell me you were going to marry her? | Lily broke up with me last week, in fact, she dumped me. | ||
Joy | I really enjoy the beach in the summer. | You can go to the Employment Development Office and pick it up. | ||
Neutral | You can go to the Employment Development Office and pick it up. | You can't be serious, how dare you not tell me you were going to marry her? | ||
Sadness | Lily broke up with me last week, in fact, she dumped me. | He was astonished when he saw them come alone, and asked what had happened to them. | ||
Surprise | He was astonished when he saw them come alone, and asked what had happened to them. | I really enjoy the beach in the summer. |