Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath,, Edison Thomaz

TL;DR
This paper presents a transfer learning approach for detecting foreground speech from smartwatch audio in home environments, enabling social interaction detection without user-specific voice samples.
Contribution
It introduces a transfer learning method that leverages general speaker representations to identify foreground speech without requiring personalized voice data.
Findings
Achieved 80% F1 score in detecting foreground speech
Performed comparably to fully supervised models
Collected 31 hours of smartwatch audio data in real homes
Abstract
Acoustic sensing has proved effective as a foundation for numerous applications in health and human behavior analysis. In this work, we focus on the problem of detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step towards detecting social interactions, it is critical to distinguish the speech of the individual wearing the watch from all other sounds nearby, such as speech from other individuals and ambient sounds. This is very challenging in realistic settings, where interactions take place spontaneously and supervised models cannot be trained apriori to recognize the full complexity of dynamic social environments. In this paper, we introduce a transfer learning-based approach to detect foreground speech of users wearing a smartwatch. A highlight of the method is that it does not depend on the collection of voice samples to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing
