HELM: A Human-Centered Evaluation Framework for LLM-Powered Recommender Systems
Sushant Mehta

TL;DR
This paper introduces HELM, a comprehensive human-centered evaluation framework for LLM-powered recommender systems, assessing multiple qualitative dimensions beyond traditional accuracy metrics to better capture user experience.
Contribution
It presents HELM, a novel evaluation framework that systematically measures human-centered qualities of LLM-based recommenders across five key dimensions.
Findings
GPT-4 has the highest explanation quality and interaction naturalness.
GPT-4 shows significant popularity bias compared to traditional methods.
HELM reveals critical quality aspects invisible to traditional metrics.
Abstract
The integration of Large Language Models (LLMs) into recommendation systems has introduced unprecedented capabilities for natural language understanding, explanation generation, and conversational interactions. However, existing evaluation methodologies focus predominantly on traditional accuracy metrics, failing to capture the multifaceted human-centered qualities that determine the real-world user experience. We introduce \framework{} (\textbf{H}uman-centered \textbf{E}valuation for \textbf{L}LM-powered reco\textbf{M}menders), a comprehensive evaluation framework that systematically assesses LLM-powered recommender systems across five human-centered dimensions: \textit{Intent Alignment}, \textit{Explanation Quality}, \textit{Interaction Naturalness}, \textit{Trust \& Transparency}, and \textit{Fairness \& Diversity}. Through extensive experiments involving three state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education · Topic Modeling
