An empirical user-study of text-based nonverbal annotation systems for human-human conversations
Joshua Y. Kim, Kalina Yacef

TL;DR
This study compares manual and automated multimodal transcription systems for human conversations, revealing usability differences and highlighting strengths and confusions in text-based annotations.
Contribution
It provides an empirical evaluation of three multimodal transcription methods, including a new visualization of machine attention, informing future system design.
Findings
MONAH is more usable than Jefferson.
All text-based methods reduce information for users.
Enlarging words by machine attention caused confusion.
Abstract
the substantial increase in the number of online human-human conversations and the usefulness of multimodal transcripts, there is a rising need for automated multimodal transcription systems to help us better understand the conversations. In this paper, we evaluated three methods to perform multimodal transcription. They were (1) Jefferson -- an existing manual system used widely by the linguistics community, (2) MONAH -- a system that aimed to make multimodal transcripts accessible and automated, (3) MONAH+ -- a system that builds on MONAH that visualizes machine attention. Based on 104 participants responses, we found that (1) all text-based methods significantly reduced the amount of information for the human users, (2) MONAH was found to be more usable than Jefferson, (3) Jefferson's relative strength was in chronemics (pace / delay) and paralinguistics (pitch / volume) annotations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Language, Metaphor, and Cognition · Digital Communication and Language
