Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval

Qing Wang; Chong-Wah Ngo; Yu Cao; Ee-Peng Lim

arXiv:2510.20393·cs.CV·October 24, 2025

Mitigating Cross-modal Representation Bias for Multicultural Image-to-Recipe Retrieval

Qing Wang, Chong-Wah Ngo, Yu Cao, Ee-Peng Lim

PDF

Open Access

TL;DR

This paper introduces a causal approach to improve image-to-recipe retrieval by addressing biases in cross-modal representations, especially for multicultural cuisines, leading to better retrieval of subtle culinary details.

Contribution

It proposes a novel causal method that predicts overlooked culinary elements and explicitly incorporates them into cross-modal learning, enhancing retrieval accuracy across diverse cuisines.

Findings

01

Improved retrieval performance on Recipe1M and multicultural datasets.

02

Effective uncovering of subtle ingredients and cooking actions.

03

Mitigation of representation bias in multicultural cuisine retrieval.

Abstract

Existing approaches for image-to-recipe retrieval have the implicit assumption that a food image can fully capture the details textually documented in its recipe. However, a food image only reflects the visual outcome of a cooked dish and not the underlying cooking process. Consequently, learning cross-modal representations to bridge the modality gap between images and recipes tends to ignore subtle, recipe-specific details that are not visually apparent but are crucial for recipe retrieval. Specifically, the representations are biased to capture the dominant visual elements, resulting in difficulty in ranking similar recipes with subtle differences in use of ingredients and cooking methods. The bias in representation learning is expected to be more severe when the training data is mixed of images and recipes sourced from different cuisines. This paper proposes a novel causal approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNutritional Studies and Diet · Image Retrieval and Classification Techniques · Advanced Image and Video Retrieval Techniques