Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing

Alice Zhang; Callihan Bertley; Dawei Liang; Edison Thomaz

arXiv:2507.12002·cs.LG·May 13, 2026

Detecting In-Person Conversations in Noisy Real-World Environments with Smartwatch Audio and Motion Sensing

Alice Zhang, Callihan Bertley, Dawei Liang, Edison Thomaz

PDF

TL;DR

This paper presents a multimodal smartwatch-based system that detects face-to-face conversations using synchronized audio and motion data, achieving high accuracy in lab and real-world settings.

Contribution

It introduces a novel neural network framework that fuses audio and inertial data for real-time conversation detection on commercial smartwatches.

Findings

01

Achieved 82% macro F1-score in lab settings.

02

Achieved 77.2% macro F1-score in semi-naturalistic environments.

03

Demonstrated real-time detection on a commercial smartwatch.

Abstract

Social interactions play a crucial role in shaping human behavior, relationships, and societies. It encompasses various forms of communication, such as verbal conversation, non-verbal gestures, facial expressions, and body language. In this work, we develop a novel computational approach to detect face-to-face verbal conversations, a foundational aspect of human social interactions. We leverage multimodal data captured by a commodity smartwatch, specifically synchronizing microphone audio with 6-axis inertial signals (accelerometer and gyroscope). We design, train, and evaluate convolutional and attention-based neural networks using three different fusion methods to integrate the audio and motion modalities. To validate this framework, we conduct a lab study with 11 participants and a semi-naturalistic study with 24 participants. Our comprehensive evaluation demonstrates that fusing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.