From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

Jeeho Shin; Kyungho Kim; Kijung Shin

arXiv:2511.19176·cs.LG·April 23, 2026

From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

Jeeho Shin, Kyungho Kim, Kijung Shin

PDF

TL;DR

This paper introduces TESMR, a three-stage framework that systematically refines multimodal features into effective recipe embeddings, significantly improving recommendation performance on real-world datasets.

Contribution

The paper presents TESMR, a novel three-stage method for enhancing multimodal features into embeddings, leading to better recipe recommendation accuracy.

Findings

01

TESMR achieves 7-15% higher Recall@10 compared to existing methods.

02

Simple multimodal signals can yield competitive recommendation performance.

03

Systematic enhancement of multimodal features is highly promising.

Abstract

Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.