Are Today's LLMs Ready to Explain Well-Being Concepts?
Bohan Jiang, Dawei Li, Zhen Tan, Chengshuai Zhao, Huan Liu

TL;DR
This paper evaluates whether current large language models can generate accurate, audience-tailored explanations of well-being concepts, introducing a large dataset and a novel evaluation framework to improve explanation quality.
Contribution
It creates a large-scale dataset of well-being explanations, proposes a principle-guided LLM evaluation framework, and demonstrates that fine-tuning with preference-based learning improves explanation quality.
Findings
LLM judges align well with human evaluations
Explanation quality varies across models and audiences
Fine-tuned models outperform larger, pre-trained models
Abstract
Well-being encompasses mental, physical, and social dimensions essential to personal growth and informed life decisions. As individuals increasingly consult Large Language Models (LLMs) to understand well-being, a key challenge emerges: Can LLMs generate explanations that are not only accurate but also tailored to diverse audiences? High-quality explanations require both factual correctness and the ability to meet the expectations of users with varying expertise. In this work, we construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts, generated by ten diverse LLMs. We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality. Furthermore, we show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
