TL;DR
This paper demonstrates that deep learning models can effectively detect eating intake gestures directly from 360-degree video data, achieving high accuracy and highlighting the importance of appearance features and temporal context.
Contribution
It introduces a novel approach for video-based intake gesture detection using deep learning on 360-degree videos, filling a gap in existing research.
Findings
Best model achieves an F1 score of 0.858
Appearance features are more important than motion features
Temporal context from multiple frames improves performance
Abstract
Automatic detection of individual intake gestures during eating occasions has the potential to improve dietary monitoring and support dietary recommendations. Existing studies typically make use of on-body solutions such as inertial and audio sensors, while video is used as ground truth. Intake gesture detection directly based on video has rarely been attempted. In this study, we address this gap and show that deep learning architectures can successfully be applied to the problem of video-based detection of intake gestures. For this purpose, we collect and label video data of eating occasions using 360-degree video of 102 participants. Applying state-of-the-art approaches from video action recognition, our results show that (1) the best model achieves an score of 0.858, (2) appearance features contribute more than motion features, and (3) temporal context in form of multiple video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
