V-Nutri: Dish-Level Nutrition Estimation from Egocentric Cooking Videos

Chengkun Yue; Chuanzhi Xu; Jiangpeng He

arXiv:2604.11913·cs.CV·April 15, 2026

V-Nutri: Dish-Level Nutrition Estimation from Egocentric Cooking Videos

Chengkun Yue, Chuanzhi Xu, Jiangpeng He

PDF

1 Repo

TL;DR

V-Nutri leverages egocentric cooking videos and process cues to improve dish-level nutrition estimation, addressing limitations of static image-based methods by incorporating cooking process information.

Contribution

The paper introduces V-Nutri, a novel staged framework combining visual backbones and process keyframes for enhanced nutrition estimation from egocentric videos.

Findings

01

Process cues improve nutrition estimation accuracy.

02

Backbone capacity and event detection quality influence benefits.

03

Annotated HD-EPIC dataset and benchmark established.

Abstract

Nutrition estimation of meals from visual data is an important problem for dietary monitoring and computational health, but existing approaches largely rely on single images of the finally completed dish. This setting is fundamentally limited because many nutritionally relevant ingredients and transformations, such as oils, sauces, and mixed components, become visually ambiguous after cooking, making accurate calorie and macronutrient estimation difficult. In this paper, we investigate whether the cooking process information from egocentric cooking videos can contribute to dish-level nutrition estimation. First, we further manually annotated the HD-EPIC dataset and established the first benchmark for video-based nutrition estimation. Most importantly, we propose V-Nutri, a staged framework that combines Nutrition5K-pretrained visual backbones with a lightweight fusion module that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

K624-YCK/V-Nutri
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.