RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents

Imad Aouali; Flavian Vasile; Otmane Sakhi; Alexandre Gilotte; Benjamin Heymann

arXiv:2605.18805·cs.IR·May 20, 2026

RecoAtlas: From Semantic Plausibility to Set-Level Utility in LLM Recommendation Agents

Imad Aouali, Flavian Vasile, Otmane Sakhi, Alexandre Gilotte, Benjamin Heymann

PDF

TL;DR

RecoAtlas is a comprehensive benchmark and toolkit for evaluating shopping recommendation agents, focusing on behavior-grounded metrics like relevance, diversity, and coherence, beyond just semantic plausibility.

Contribution

It introduces a new evaluation framework that assesses agents using behavior-grounded metrics and controlled experiments to diagnose performance sources.

Findings

01

Performance improves with larger models and better tools.

02

Semantic plausibility does not always correlate with behavior-grounded utility.

03

Performance degrades with noisy or misaligned signals.

Abstract

LLM recommendation agents increasingly produce structured recommendation reports: sets of items accompanied by natural-language justifications. Yet existing evaluations often reduce this setting to reranking small shortlisted candidate sets or judge reports mainly by semantic plausibility. We introduce Recommendation Atlas (Agentic Tool-Level Assessment for Shopping), or RecoAtlas, a benchmark and toolkit for evaluating shopping agents with behavior-grounded metrics. RecoAtlas complements held-out interaction metrics with learned utility proxies for relevance, complementarity, and diversity derived from interaction data, while separately measuring semantic coherence and explanation quality. Its controlled tool environment exposes agents to either semantic, behavior-aligned, or faulty tools, enabling diagnosis of whether performance gains arise from stronger reasoning, better signals, or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.