Automated detection of foreground speech with wearable sensing in   everyday home environments: A transfer learning approach

Dawei Liang; Zifan Xu; Yinuo Chen; Rebecca Adaimi; David Harwath,; Edison Thomaz

arXiv:2203.11294·cs.SD·March 23, 2022·1 cites

Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach

Dawei Liang, Zifan Xu, Yinuo Chen, Rebecca Adaimi, David Harwath,, Edison Thomaz

PDF

Open Access

TL;DR

This paper presents a transfer learning approach for detecting foreground speech from smartwatch audio in home environments, enabling social interaction detection without user-specific voice samples.

Contribution

It introduces a transfer learning method that leverages general speaker representations to identify foreground speech without requiring personalized voice data.

Findings

01

Achieved 80% F1 score in detecting foreground speech

02

Performed comparably to fully supervised models

03

Collected 31 hours of smartwatch audio data in real homes

Abstract

Acoustic sensing has proved effective as a foundation for numerous applications in health and human behavior analysis. In this work, we focus on the problem of detecting in-person social interactions in naturalistic settings from audio captured by a smartwatch. As a first step towards detecting social interactions, it is critical to distinguish the speech of the individual wearing the watch from all other sounds nearby, such as speech from other individuals and ambient sounds. This is very challenging in realistic settings, where interactions take place spontaneously and supervised models cannot be trained apriori to recognize the full complexity of dynamic social environments. In this paper, we introduce a transfer learning-based approach to detect foreground speech of users wearing a smartwatch. A highlight of the method is that it does not depend on the collection of voice samples to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing