EchoGuide: Active Acoustic Guidance for LLM-Based Eating Event Analysis from Egocentric Videos
Vineet Parikh, Saif Mahmud, Devansh Agarwal, Ke Li, Fran\c{c}ois, Guimbreti\`ere, Cheng Zhang

TL;DR
EchoGuide uses active acoustic sensing and advanced video analysis to efficiently capture and summarize eating behaviors from egocentric videos, improving accuracy and reducing data volume for health monitoring.
Contribution
This work introduces a novel system combining active acoustic sensing with video captioning and language models for detailed, scalable eating activity analysis from wearable devices.
Findings
High-quality summarization of eating activities achieved
Significant reduction in video data required
Effective detection and analysis in naturalistic settings
Abstract
Self-recording eating behaviors is a step towards a healthy lifestyle recommended by many health professionals. However, the current practice of manually recording eating activities using paper records or smartphone apps is often unsustainable and inaccurate. Smart glasses have emerged as a promising wearable form factor for tracking eating behaviors, but existing systems primarily identify when eating occurs without capturing details of the eating activities (E.g., what is being eaten). In this paper, we present EchoGuide, an application and system pipeline that leverages low-power active acoustic sensing to guide head-mounted cameras to capture egocentric videos, enabling efficient and detailed analysis of eating activities. By combining active acoustic sensing for eating detection with video captioning models and large-scale language models for retrieval augmentation, EchoGuide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCulinary Culture and Tourism · Music and Audio Processing
