Investigating the Impact of Large-Scale Pre-training on Nutritional Content Estimation from 2D Images
Michele Andrade, Guilherme A. L. Silva, Val\'eria Santos, Gladston Moreira, Eduardo Luz

TL;DR
This study evaluates how large-scale pre-training datasets influence the accuracy of deep learning models in estimating food nutrition from 2D images, emphasizing the importance of dataset quality and relevance.
Contribution
It demonstrates that pre-training on large, high-quality datasets like JFT-300M improves nutritional estimation, while larger but less relevant datasets like COYO may hinder performance.
Findings
Models pre-trained on JFT-300M outperform others.
Pre-training on COYO dataset performs worse than ImageNet.
Dataset characteristics critically affect transfer learning success.
Abstract
Estimating the nutritional content of food from images is a critical task with significant implications for health and dietary monitoring. This is challenging, especially when relying solely on 2D images, due to the variability in food presentation, lighting, and the inherent difficulty in inferring volume and mass without depth information. Furthermore, reproducibility in this domain is hampered by the reliance of state-of-the-art methods on proprietary datasets for large-scale pre-training. In this paper, we investigate the impact of large-scale pre-training datasets on the performance of deep learning models for nutritional estimation using only 2D images. We fine-tune and evaluate Vision Transformer (ViT) models pre-trained on two large public datasets, ImageNet and COYO, comparing their performance against baseline CNN models (InceptionV2 and ResNet-50) and a state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
