6D Pose Estimation on Spoons and Hands
Kevin Tan, Fan Yang, Yuhao Chen

TL;DR
This paper develops a system for analyzing eating behaviors by estimating the 6D pose of hands and spoons from stationary video, aiming to improve dietary monitoring accuracy.
Contribution
It evaluates two state-of-the-art video object segmentation models for 6D pose estimation in eating scenarios, highlighting their performance and error sources.
Findings
SOTA models can track hand and spoon movements with reasonable accuracy.
Identification of key error sources in 6D pose estimation during eating.
Potential for improved dietary monitoring through pose analysis.
Abstract
Accurate dietary monitoring is essential for promoting healthier eating habits. A key area of research is how people interact and consume food using utensils and hands. By tracking their position and orientation, it is possible to estimate the volume of food being consumed, or monitor eating behaviours, highly useful insights into nutritional intake that can be more reliable than popular methods such as self-reporting. Hence, this paper implements a system that analyzes stationary video feed of people eating, using 6D pose estimation to track hand and spoon movements to capture spatial position and orientation. In doing so, we examine the performance of two state-of-the-art (SOTA) video object segmentation (VOS) models, both quantitatively and qualitatively, and identify main sources of error within the system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems
