Consistent Explainers or Unreliable Narrators? Understanding LLM-generated Group Recommendations
Cedric Waterschoot, Nava Tintarev, Francesco Barile

TL;DR
This paper critically evaluates the reliability and consistency of LLM-generated recommendations and explanations in group recommender systems, revealing tendencies towards certain aggregation strategies and highlighting issues with transparency and explanation quality.
Contribution
It provides a comparative analysis of LLM outputs against social choice-based strategies, exposing limitations in consistency and explainability in group recommendation contexts.
Findings
LLMs often mimic Additive Utilitarian aggregation in recommendations.
Explanations frequently refer to averaging ratings, but are inconsistent.
Inconsistent explanations reduce transparency and undermine trust.
Abstract
Large Language Models (LLMs) are increasingly being implemented as joint decision-makers and explanation generators for Group Recommender Systems (GRS). In this paper, we evaluate these recommendations and explanations by comparing them to social choice-based aggregation strategies. Our results indicate that LLM-generated recommendations often resembled those produced by Additive Utilitarian (ADD) aggregation. However, the explanations typically referred to averaging ratings (resembling but not identical to ADD aggregation). Group structure, uniform or divergent, did not impact the recommendations. Furthermore, LLMs regularly claimed additional criteria such as user or item similarity, diversity, or used undefined popularity metrics or thresholds. Our findings have important implications for LLMs in the GRS pipeline as well as standard aggregation strategies. Additional criteria in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
