RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents
Imad Aouali, Flavian Vasile, Otmane Sakhi, Alexandre Gilotte, Benjamin Heymann

TL;DR
RecoAtlas is a comprehensive benchmark and toolkit for evaluating shopping recommendation agents, focusing on behavior-grounded metrics like relevance, diversity, and coherence, beyond just semantic plausibility.
Contribution
It introduces a new evaluation framework that assesses agents using behavior-grounded metrics and controlled experiments to diagnose performance sources.
Findings
Performance improves with larger models and better tools.
Semantic plausibility does not always correlate with behavior-grounded utility.
Performance degrades with noisy or misaligned signals.
Abstract
LLM recommendation agents increasingly produce structured recommendation reports: sets of items accompanied by natural-language justifications. Yet existing evaluations often reduce this setting to reranking small shortlisted candidate sets or judge reports mainly by semantic plausibility. We introduce Recommendation Atlas (Agentic Tool-Level Assessment for Shopping), or RecoAtlas, a benchmark and toolkit for evaluating shopping agents with behavior-grounded metrics. RecoAtlas complements held-out interaction metrics with learned utility proxies for relevance, complementarity, and diversity derived from interaction data, while separately measuring semantic coherence and explanation quality. Its controlled tool environment exposes agents to either semantic, behavior-aligned, or faulty tools, enabling diagnosis of whether performance gains arise from stronger reasoning, better signals, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
