Nutrition Estimation for Dietary Management: A Transformer Approach with Depth Sensing
Zhengyi Kwan, Wei Zhang, Zhengkui Wang, Aik Beng Ng, Simon, See

TL;DR
NuNet, a transformer-based model utilizing RGB and depth food images, significantly improves nutrition estimation accuracy, achieving the lowest error rate of 15.65% through innovative multi-scale architecture and feature fusion.
Contribution
Introduces NuNet, a novel transformer-based network with multi-scale encoding and fusion modules for accurate nutrition estimation from food images.
Findings
Achieves 15.65% error rate, the lowest reported.
Outperforms existing nutrition estimation methods.
Demonstrates effective multi-modal data fusion.
Abstract
Nutrition estimation is crucial for effective dietary management and overall health and well-being. Existing methods often struggle with sub-optimal accuracy and can be time-consuming. In this paper, we propose NuNet, a transformer-based network designed for nutrition estimation that utilizes both RGB and depth information from food images. We have designed and implemented a multi-scale encoder and decoder, along with two types of feature fusion modules, specialized for estimating five nutritional factors. These modules effectively balance the efficiency and effectiveness of feature extraction with flexible usage of our customized attention mechanisms and fusion strategies. Our experimental study shows that NuNet outperforms its variants and existing solutions significantly for nutrition estimation. It achieves an error rate of 15.65%, the lowest known to us, largely due to our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNutritional Studies and Diet
