What makes a good pause? Investigating the turn-holding effects of fillers
Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

TL;DR
This study investigates how filled pauses like 'uh' and 'um' influence turn-holding in conversation using a deep learning model, revealing that their effect is nuanced by prosody and position but not by filler type.
Contribution
It introduces the use of the Voice Activity Projection model to analyze turn-holding effects of fillers, highlighting the roles of prosody and position over filler type.
Findings
Filled pauses have a turn-holding effect, but less than expected.
Prosodic properties and position significantly influence turn-hold probability.
No difference found between 'uh' and 'um' in turn-holding effects.
Abstract
Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet. In this paper, we use the recently proposed Voice Activity Projection (VAP) model, which is a deep learning model trained to predict the dynamics of conversation, to analyse the effects of filled pauses on the expected turn-hold probability. The results show that, while filled pauses do indeed have a turn-holding effect, it is perhaps not as strong as could be expected, probably due to the redundancy of other cues. We also find that the prosodic properties and position of the filler has a significant effect on the turn-hold probability. However, contrary to what has been suggested in previous work, there is no difference between "uh" and "um" in this regard.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and dialogue systems
