Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations

Seongsil Heo; Calvin Murdock; Michael Proulx; Christi Miller

arXiv:2505.13688·cs.HC·May 30, 2025

Gaze-Enhanced Multimodal Turn-Taking Prediction in Triadic Conversations

Seongsil Heo, Calvin Murdock, Michael Proulx, Christi Miller

PDF

TL;DR

This paper presents a lightweight, privacy-conscious framework that integrates gaze and speaker localization to improve turn-taking prediction in triadic conversations, enhancing speech intelligibility in noisy environments.

Contribution

It introduces a novel method combining gaze and spatial cues for turn-taking prediction without heavy computation, advancing multimodal interaction modeling.

Findings

01

Gaze data from a single user improves prediction accuracy.

02

Multi-user gaze data further enhances prediction performance.

03

The approach supports adaptive sound control in noisy environments.

Abstract

Turn-taking prediction is crucial for seamless interactions. This study introduces a novel, lightweight framework for accurate turn-taking prediction in triadic conversations without relying on computationally intensive methods. Unlike prior approaches that either disregard gaze or treat it as a passive signal, our model integrates gaze with speaker localization, structuring it within a spatial constraint to transform it into a reliable predictive cue. Leveraging egocentric behavioral cues, our experiments demonstrate that incorporating gaze data from a single-user significantly improves prediction performance, while gaze data from multiple-users further enhances it by capturing richer conversational dynamics. This study presents a lightweight and privacy-conscious approach to support adaptive, directional sound control, enhancing speech intelligibility in noisy environments,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.