Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation

Guoshan Liu; Bin Zhu; Yian Li; Jingjing Chen; Chong-Wah Ngo; Yu-Gang Jiang

arXiv:2602.15862·cs.CL·February 19, 2026

Enhancing Action and Ingredient Modeling for Semantically Grounded Recipe Generation

Guoshan Liu, Bin Zhu, Yian Li, Jingjing Chen, Chong-Wah Ngo, Yu-Gang Jiang

PDF

Open Access

TL;DR

This paper introduces a semantically grounded framework for recipe generation that improves the accuracy of actions and ingredients in generated recipes by combining supervised and reinforcement fine-tuning, along with a validation module.

Contribution

It proposes a novel two-stage pipeline with semantic validation for recipe generation, enhancing semantic fidelity over previous multimodal models.

Findings

01

Achieves state-of-the-art performance on Recipe1M.

02

Significantly improves semantic accuracy of actions and ingredients.

03

Effective filtering and correction with SCSR module.

Abstract

Recent advances in Multimodal Large Language Models (MLMMs) have enabled recipe generation from food images, yet outputs often contain semantically incorrect actions or ingredients despite high lexical scores (e.g., BLEU, ROUGE). To address this gap, we propose a semantically grounded framework that predicts and validates actions and ingredients as internal context for instruction generation. Our two-stage pipeline combines supervised fine-tuning (SFT) with reinforcement fine-tuning (RFT): SFT builds foundational accuracy using an Action-Reasoning dataset and ingredient corpus, while RFT employs frequency-aware rewards to improve long-tail action prediction and ingredient generalization. A Semantic Confidence Scoring and Rectification (SCSR) module further filters and corrects predictions. Experiments on Recipe1M show state-of-the-art performance and markedly improved semantic fidelity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Nutritional Studies and Diet · Generative Adversarial Networks and Image Synthesis