TL;DR
This paper presents MONAH, a system that automatically enriches conversation transcripts with multimodal data to improve the detection of rapport-building, making conversational analysis more efficient and interpretable.
Contribution
The paper introduces a novel system that integrates multimodal annotations into transcripts and demonstrates their effectiveness in rapport detection.
Findings
Multimodal features significantly improve rapport detection accuracy.
The system automates the weaving of multimodal data into transcripts.
Expanded multimodal annotations lead to statistically significant performance gains.
Abstract
In conversational analyses, humans manually weave multimodal information into the transcripts, which is significantly time-consuming. We introduce a system that automatically expands the verbatim transcripts of video-recorded conversations using multimodal data streams. This system uses a set of preprocessing rules to weave multimodal annotations into the verbatim transcripts and promote interpretability. Our feature engineering contributions are two-fold: firstly, we identify the range of multimodal features relevant to detect rapport-building; secondly, we expand the range of multimodal annotations and show that the expansion leads to statistically significant improvements in detecting rapport-building.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
