How does a spontaneously speaking conversational agent affect user behavior?
Takahisa Iizuka, Hiroki Mori

TL;DR
This study shows that conversational agents with speech synthesized from spontaneous speech evoke more social responses and are perceived as more human-like by users.
Contribution
It demonstrates that speech synthesis based on spontaneous speech improves social interaction and perception of conversational agents.
Findings
Agents with spontaneous speech synthesis elicited shorter response times.
Users showed more backchannels during spontaneous speech interactions.
Participants rated spontaneous speech agents as more human-like.
Abstract
This study investigated the effect of synthetic voice of conversational agent trained with spontaneous speech on human interactants. Specifically, we hypothesized that humans will exhibit more social responses when interacting with conversational agent that has a synthetic voice built on spontaneous speech. Typically, speech synthesizers are built on a speech corpus where voice professionals read a set of written sentences. The synthesized speech is clear as if a newscaster were reading a news or a voice actor were playing an anime character. However, this is quite different from spontaneous speech we speak in everyday conversation. Recent advances in speech synthesis enabled us to build a speech synthesizer on a spontaneous speech corpus, and to obtain a near conversational synthesized speech with reasonable quality. By making use of these technology, we examined whether humans produce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
