Rank, Don't Generate: Statement-level Ranking for Explainable Recommendation
Ben Kabongo, Arthur Satouf, Vincent Guigue

TL;DR
This paper proposes a statement-level ranking approach for explainable recommendation systems, addressing evaluation challenges and introducing a new benchmark called StaR.
Contribution
It formalizes explainable recommendation as a ranking problem, develops an extraction and clustering pipeline, and introduces StaR benchmark for evaluation.
Findings
Popularity baselines perform well in global ranking.
State-of-the-art models excel in item-level ranking.
Limitations in personalized explanation ranking are identified.
Abstract
Textual explanations, generated with large language models (LLMs), are increasingly used to justify recommendations. Yet, evaluating these explanations remains a critical challenge. We advocate a shift in objective: rank, don't generate. We formalize explainable recommendation as a statement-level ranking problem, where systems rank candidate explanatory statements derived from reviews and return the top-k as explanation. This formulation mitigates hallucination by construction and enables fine-grained factual analysis. It also models factor importance through relevance scores and supports standardized, reproducible evaluation with established ranking metrics. Meaningful assessment, however, requires each statement to be explanatory (item facts affecting user experience), atomic (one opinion about one aspect), and unique (paraphrases consolidated), which is challenging to obtain from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
