Predefined domain specific embeddings of food concepts and recipes: A   case study on heterogeneous recipe datasets

Gordana Ispirova; Tome Eftimov; and Barbara Korou\v{s}i\'c Seljak

arXiv:2302.01005·cs.CL·February 3, 2023

Predefined domain specific embeddings of food concepts and recipes: A case study on heterogeneous recipe datasets

Gordana Ispirova, Tome Eftimov, and Barbara Korou\v{s}i\'c Seljak

PDF

TL;DR

This paper develops domain-specific embeddings for food concepts and recipes by normalizing and merging heterogeneous datasets, improving machine learning performance in nutrient prediction tasks.

Contribution

It introduces a method for creating unified ingredient and recipe embeddings from diverse datasets, enhancing ML applications in food data analysis.

Findings

01

Merged embeddings outperform baseline models

02

Normalization improves data consistency

03

Domain-specific embeddings enhance nutrient prediction accuracy

Abstract

Although recipe data are very easy to come by nowadays, it is really hard to find a complete recipe dataset - with a list of ingredients, nutrient values per ingredient, and per recipe, allergens, etc. Recipe datasets are usually collected from social media websites where users post and publish recipes. Usually written with little to no structure, using both standardized and non-standardized units of measurement. We collect six different recipe datasets, publicly available, in different formats, and some including data in different languages. Bringing all of these datasets to the needed format for applying a machine learning (ML) pipeline for nutrient prediction [1], [2], includes data normalization using dictionary-based named entity recognition (NER), rule-based NER, as well as conversions using external domain-specific resources. From the list of ingredients, domain-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.