6D Pose Estimation on Spoons and Hands

Kevin Tan; Fan Yang; Yuhao Chen

arXiv:2505.02335·cs.CV·May 6, 2025

6D Pose Estimation on Spoons and Hands

Kevin Tan, Fan Yang, Yuhao Chen

PDF

Open Access

TL;DR

This paper develops a system for analyzing eating behaviors by estimating the 6D pose of hands and spoons from stationary video, aiming to improve dietary monitoring accuracy.

Contribution

It evaluates two state-of-the-art video object segmentation models for 6D pose estimation in eating scenarios, highlighting their performance and error sources.

Findings

01

SOTA models can track hand and spoon movements with reasonable accuracy.

02

Identification of key error sources in 6D pose estimation during eating.

03

Potential for improved dietary monitoring through pose analysis.

Abstract

Accurate dietary monitoring is essential for promoting healthier eating habits. A key area of research is how people interact and consume food using utensils and hands. By tracking their position and orientation, it is possible to estimate the volume of food being consumed, or monitor eating behaviours, highly useful insights into nutritional intake that can be more reliable than popular methods such as self-reporting. Hence, this paper implements a system that analyzes stationary video feed of people eating, using 6D pose estimation to track hand and spoon movements to capture spatial position and orientation. In doing so, we examine the performance of two state-of-the-art (SOTA) video object segmentation (VOS) models, both quantitatively and qualitatively, and identify main sources of error within the system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems