No-audio speaking status detection in crowded settings via visual pose-based filtering and wearable acceleration
Jose Vargas-Quiros, Laura Cabrera-Quiros, Hayley Hung

TL;DR
This paper proposes a novel, privacy-preserving method for detecting speaking status in crowded scenes by leveraging visual pose estimation and wearable acceleration data, improving robustness and efficiency over traditional approaches.
Contribution
It introduces a pose-based filtering technique for subject localization and combines visual and wearable sensor data for unobtrusive speech detection in crowded environments.
Findings
Pose-based filtering improves generalization in crowded scenes.
Combining visual and wearable data enhances detection accuracy.
Method reduces local feature complexity for efficiency.
Abstract
Recognizing who is speaking in a crowded scene is a key challenge towards the understanding of the social interactions going on within. Detecting speaking status from body movement alone opens the door for the analysis of social scenes in which personal audio is not obtainable. Video and wearable sensors make it possible recognize speaking in an unobtrusive, privacy-preserving way. When considering the video modality, in action recognition problems, a bounding box is traditionally used to localize and segment out the target subject, to then recognize the action taking place within it. However, cross-contamination, occlusion, and the articulated nature of the human body, make this approach challenging in a crowded scene. Here, we leverage articulated body poses for subject localization and in the subsequent speech detection stage. We show that the selection of local features around pose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Speech and Audio Processing
