Epicure: Navigating the Emergent Geometry of Food Ingredient Embeddings
Jakub Radzikowski, Josef Chen

TL;DR
Epicure introduces multilingual ingredient embeddings trained on a large recipe corpus, revealing emergent geometric structures that reflect culinary and chemical relationships.
Contribution
The paper presents a novel multilingual ingredient embedding approach using a large recipe dataset and explores their emergent geometry through different random walk schemas.
Findings
Three distinct embedding models capture different aspects of ingredient relationships.
A large, normalized multilingual recipe corpus was created and used for training.
The models reveal meaningful culinary and chemical structures in ingredient space.
Abstract
We present Epicure, a family of three sibling skip-gram ingredient embeddings retrained from scratch on a multilingual recipe corpus. We aggregate 4.14M recipes from 11 sources spanning seven languages, English, Chinese, Russian, Vietnamese, Spanish, Turkish, Indonesian, German, and Indian-English, and normalise the raw ingredient strings to 1,790 canonical entries via an LLM-augmented pipeline. A 203,508-edge ingredient-ingredient NPMI graph and an 80,019-edge typed FlavorDB ingredient-compound graph, 2,247 typed compound nodes across 15 categories, seed three Metapath2Vec variants that share architecture and hyperparameters and differ only in the random-walk schema: Cooc walks the co-occurrence graph only, Chem walks the typed compound metapaths only, and Core blends both via injected ingredient-ingredient walks at controlled mixing, placing each model at a distinct point on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
