When LLMs Can't Help: Real-World Evaluation of LLMs in Nutrition
Karen Jia-Hui Li, Simone Balloccu, Ondrej Dusek, and Ehud Reiter

TL;DR
This study conducts the first randomized controlled trial to evaluate the real-world effectiveness of LLM-enhanced chatbots in nutrition, revealing a gap between intrinsic performance and practical impact.
Contribution
It introduces a rigorous RCT assessing LLM features in a nutrition chatbot, highlighting the discrepancy between intrinsic evaluation success and real-world effectiveness.
Findings
LLM features improved intrinsic evaluation metrics
No consistent benefits observed in dietary or emotional outcomes
Highlights importance of human-centered evaluation approaches
Abstract
The increasing trust in large language models (LLMs), especially in the form of chatbots, is often undermined by the lack of their extrinsic evaluation. This holds particularly true in nutrition, where randomised controlled trials (RCTs) are the gold standard, and experts demand them for evidence-based deployment. LLMs have shown promising results in this field, but these are limited to intrinsic setups. We address this gap by running the first RCT involving LLMs for nutrition. We augment a rule-based chatbot with two LLM-based features: (1) message rephrasing for conversational variety and engagement, and (2) nutritional counselling through a fine-tuned model. In our seven-week RCT (n=81), we compare chatbot variants with and without LLM integration. We measure effects on dietary outcome, emotional well-being, and engagement. Despite our LLM-based features performing well in intrinsic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Digital Mental Health Interventions · Artificial Intelligence in Healthcare and Education
