MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance

Yi Dong; Yusuke Muraoka; Scott Shi; and Yi Zhang

arXiv:2508.10429·cs.AI·August 15, 2025

MM-Food-100K: A 100,000-Sample Multimodal Food Intelligence Dataset with Verifiable Provenance

Yi Dong, Yusuke Muraoka, Scott Shi, and Yi Zhang

PDF

3 Datasets

TL;DR

This paper introduces MM-Food-100K, a large, multimodal food dataset with verifiable provenance, designed for training and evaluating vision-language models in food recognition and nutrition prediction tasks.

Contribution

The paper presents a new 100,000-sample multimodal food dataset with verifiable provenance, collection methodology, and demonstrates its utility by fine-tuning large vision-language models for nutrition prediction.

Findings

01

Fine-tuning models on MM-Food-100K improves performance over baseline models.

02

The dataset enables effective training of vision-language models for food recognition.

03

The dataset is publicly available with a portion reserved for commercial use.

Abstract

We present MM-Food-100K, a public 100,000-sample multimodal food intelligence dataset with verifiable provenance. It is a curated approximately 10% open subset of an original 1.2 million, quality-accepted corpus of food images annotated for a wide range of information (such as dish name, region of creation). The corpus was collected over six weeks from over 87,000 contributors using the Codatta contribution model, which combines community sourcing with configurable AI-assisted quality checks; each submission is linked to a wallet address in a secure off-chain ledger for traceability, with a full on-chain protocol on the roadmap. We describe the schema, pipeline, and QA, and validate utility by fine-tuning large vision-language models (ChatGPT 5, ChatGPT OSS, Qwen-Max) on image-based nutrition prediction. Fine-tuning yields consistent gains over out-of-box baselines across standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.