Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images
Yuhao Chen, Gautham Vinod, Siddeshwar Raghavan, Talha Ibn Mahmud, Bruce Coburn, Jinge Ma, Fengqing Zhu, Jiangpeng He

TL;DR
This paper introduces a new benchmark dataset for 3D food reconstruction from monocular images, emphasizing implicit scale inference and geometric reasoning to improve portion estimation accuracy in realistic dining scenarios.
Contribution
The paper presents a novel benchmark dataset and framing of food volume estimation as an implicit-scale 3D reconstruction problem, advancing geometric methods over appearance-based approaches.
Findings
Geometry-based methods outperform vision-language models in accuracy.
Top methods achieve 0.21 MAPE in volume estimation.
Benchmark highlights challenges of scale ambiguity and occlusion in real-world scenes.
Abstract
We present Implicit-Scale 3D Reconstruction from Monocular Multi-Food Images, a benchmark dataset designed to advance geometry-based food portion estimation in realistic dining scenarios. Existing dietary assessment methods largely rely on single-image analysis or appearance-based inference, including recent vision-language models, which lack explicit geometric reasoning and are sensitive to scale ambiguity. This benchmark reframes food portion estimation as an implicit-scale 3D reconstruction problem under monocular observations. To reflect real-world conditions, explicit physical references and metric annotations are removed; instead, contextual objects such as plates and utensils are provided, requiring algorithms to infer scale from implicit cues and prior knowledge. The dataset emphasizes multi-food scenes with diverse object geometries, frequent occlusions, and complex spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNutritional Studies and Diet · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
