Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images

Yuhao Chen; Gautham Vinod; Siddeshwar Raghavan; Talha Ibn Mahmud; Bruce Coburn; Jinge Ma; Fengqing Zhu; Jiangpeng He

arXiv:2602.13041·cs.CV·February 16, 2026

Implicit-Scale 3D Reconstruction for Multi-Food Volume Estimation from Monocular Images

Yuhao Chen, Gautham Vinod, Siddeshwar Raghavan, Talha Ibn Mahmud, Bruce Coburn, Jinge Ma, Fengqing Zhu, Jiangpeng He

PDF

Open Access

TL;DR

This paper introduces a new benchmark dataset for 3D food reconstruction from monocular images, emphasizing implicit scale inference and geometric reasoning to improve portion estimation accuracy in realistic dining scenarios.

Contribution

The paper presents a novel benchmark dataset and framing of food volume estimation as an implicit-scale 3D reconstruction problem, advancing geometric methods over appearance-based approaches.

Findings

01

Geometry-based methods outperform vision-language models in accuracy.

02

Top methods achieve 0.21 MAPE in volume estimation.

03

Benchmark highlights challenges of scale ambiguity and occlusion in real-world scenes.

Abstract

We present Implicit-Scale 3D Reconstruction from Monocular Multi-Food Images, a benchmark dataset designed to advance geometry-based food portion estimation in realistic dining scenarios. Existing dietary assessment methods largely rely on single-image analysis or appearance-based inference, including recent vision-language models, which lack explicit geometric reasoning and are sensitive to scale ambiguity. This benchmark reframes food portion estimation as an implicit-scale 3D reconstruction problem under monocular observations. To reflect real-world conditions, explicit physical references and metric annotations are removed; instead, contextual objects such as plates and utensils are provided, requiring algorithms to infer scale from implicit cues and prior knowledge. The dataset emphasizes multi-food scenes with diverse object geometries, frequent occlusions, and complex spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNutritional Studies and Diet · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques