Predefined domain specific embeddings of food concepts and recipes: A case study on heterogeneous recipe datasets
Gordana Ispirova, Tome Eftimov, and Barbara Korou\v{s}i\'c Seljak

TL;DR
This paper develops domain-specific embeddings for food concepts and recipes by normalizing and merging heterogeneous datasets, improving machine learning performance in nutrient prediction tasks.
Contribution
It introduces a method for creating unified ingredient and recipe embeddings from diverse datasets, enhancing ML applications in food data analysis.
Findings
Merged embeddings outperform baseline models
Normalization improves data consistency
Domain-specific embeddings enhance nutrient prediction accuracy
Abstract
Although recipe data are very easy to come by nowadays, it is really hard to find a complete recipe dataset - with a list of ingredients, nutrient values per ingredient, and per recipe, allergens, etc. Recipe datasets are usually collected from social media websites where users post and publish recipes. Usually written with little to no structure, using both standardized and non-standardized units of measurement. We collect six different recipe datasets, publicly available, in different formats, and some including data in different languages. Bringing all of these datasets to the needed format for applying a machine learning (ML) pipeline for nutrient prediction [1], [2], includes data normalization using dictionary-based named entity recognition (NER), rule-based NER, as well as conversions using external domain-specific resources. From the list of ingredients, domain-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
