Ask the experts: sourcing high-quality datasets for nutritional counselling through Human-AI collaboration
Simone Balloccu, Ehud Reiter, Vivek Kumar, Diego Reforgiato Recupero, and Daniele Riboni

TL;DR
This paper demonstrates a human-AI collaborative approach to create a high-quality nutrition counselling dataset by combining crowd-sourcing, expert input, and ChatGPT, highlighting both its potential and risks.
Contribution
It introduces HAI-coaching, the first expert-annotated nutrition dataset, and presents a novel methodology for dataset creation in low-resource domains using LLMs and human expertise.
Findings
ChatGPT generates fluent, human-like supportive texts.
Generated texts can contain harmful biases, especially in sensitive topics.
The dataset enables future research in nutrition counselling AI applications.
Abstract
Large Language Models (LLMs), with their flexible generation abilities, can be powerful data sources in domains with few or no available corpora. However, problems like hallucinations and biases limit such applications. In this case study, we pick nutrition counselling, a domain lacking any public resource, and show that high-quality datasets can be gathered by combining LLMs, crowd-workers and nutrition experts. We first crowd-source and cluster a novel dataset of diet-related issues, then work with experts to prompt ChatGPT into producing related supportive text. Finally, we let the experts evaluate the safety of the generated text. We release HAI-coaching, the first expert-annotated nutrition counselling dataset containing ~2.4K dietary struggles from crowd workers, and ~97K related supportive texts generated by ChatGPT. Extensive analysis shows that ChatGPT while producing highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
