Enabling Automatic Self-Talk Detection via Earables
Euihyeok Lee, Seonghyeon Kim, SangHun Im, Heung-Seon Oh, and Seungwoo Kang

TL;DR
This paper introduces MutterMeter, a novel mobile system that automatically detects vocalized self-talk using earable microphones, addressing technical challenges with a hierarchical classification approach and demonstrating high accuracy in real-world settings.
Contribution
MutterMeter is the first system to automatically detect vocalized self-talk in everyday environments using earables, combining acoustic, linguistic, and contextual data for improved accuracy.
Findings
Achieved a macro-averaged F1 score of 0.84 in real-world tests.
Outperformed conventional speech understanding and emotion recognition models.
Built on a new dataset of 31.1 hours from 25 participants.
Abstract
Self-talk-an internal dialogue that can occur silently or be spoken aloud-plays a crucial role in emotional regulation, cognitive processing, and motivation, yet has remained largely invisible and unmeasurable in everyday life. In this paper, we present MutterMeter, a mobile system that automatically detects vocalized self-talk from audio captured by earable microphones in real-world settings. Detecting self-talk is technically challenging due to its diverse acoustic forms, semantic and grammatical incompleteness, and irregular occurrence patterns, which differ fundamentally from assumptions underlying conventional speech understanding models. To address these challenges, MutterMeter employs a hierarchical classification architecture that progressively integrates acoustic, linguistic, and contextual information through a sequential processing pipeline, adaptively balancing accuracy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech Recognition and Synthesis
