No-audio speaking status detection in crowded settings via visual   pose-based filtering and wearable acceleration

Jose Vargas-Quiros; Laura Cabrera-Quiros; Hayley Hung

arXiv:2211.00549·cs.CV·November 2, 2022

No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration

Jose Vargas-Quiros, Laura Cabrera-Quiros, Hayley Hung

PDF

Open Access

TL;DR

This paper proposes a novel, privacy-preserving method for detecting speaking status in crowded scenes by leveraging visual pose estimation and wearable acceleration data, improving robustness and efficiency over traditional approaches.

Contribution

It introduces a pose-based filtering technique for subject localization and combines visual and wearable sensor data for unobtrusive speech detection in crowded environments.

Findings

01

Pose-based filtering improves generalization in crowded scenes.

02

Combining visual and wearable data enhances detection accuracy.

03

Method reduces local feature complexity for efficiency.

Abstract

Recognizing who is speaking in a crowded scene is a key challenge towards the understanding of the social interactions going on within. Detecting speaking status from body movement alone opens the door for the analysis of social scenes in which personal audio is not obtainable. Video and wearable sensors make it possible recognize speaking in an unobtrusive, privacy-preserving way. When considering the video modality, in action recognition problems, a bounding box is traditionally used to localize and segment out the target subject, to then recognize the action taking place within it. However, cross-contamination, occlusion, and the articulated nature of the human body, make this approach challenging in a crowded scene. Here, we leverage articulated body poses for subject localization and in the subsequent speech detection stage. We show that the selection of local features around pose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Speech and Audio Processing