We select two samples generated by the baselines and one sample from the ESD dataset to compare with our sample.
Target : Target samples are provided from ESD dataset.
emospeech : Baseline emospeech model.
cosyvoice2 : Baseline cosyvoice2 model.
Our EASPO : Our proposed EASPO model.
Sample 1 (Emotion: Angry)
Text: No, I burst the balloon!
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 2 (Emotion: Surprise)
Text: The football teams give a tea party.
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 3 (Emotion: Happy)
Text: That I owe my thanks to you.
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 4 (Emotion: Neutral)
Text: Poor Tom now is dead.
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 6 (Emotion: Sad)
Text: Must a name mean something?
Target
emospeech
cosyvoice2
our EASPO
Samples
| Target | emospeech | cosyvoice2 | our EASPO | |
|---|---|---|---|---|
| Samples |
Text: The football teams give a tea party.
| Target | emospeech | cosyvoice2 | our EASPO | |
|---|---|---|---|---|
| Samples |
Sample 3 (Emotion: Happy)
Text: That I owe my thanks to you.
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 4 (Emotion: Neutral)
Text: Poor Tom now is dead.
Target
emospeech
cosyvoice2
our EASPO
Samples
Sample 6 (Emotion: Sad)
Text: Must a name mean something?
Target
emospeech
cosyvoice2
our EASPO
Samples
| Target | emospeech | cosyvoice2 | our EASPO | |
|---|---|---|---|---|
| Samples |
Text: Poor Tom now is dead.
| Target | emospeech | cosyvoice2 | our EASPO | |
|---|---|---|---|---|
| Samples |
Sample 6 (Emotion: Sad)
Text: Must a name mean something?
Target
emospeech
cosyvoice2
our EASPO
Samples
| Target | emospeech | cosyvoice2 | our EASPO | |
|---|---|---|---|---|
| Samples |