AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents
Lingxiang Hu, Yiding Sun, Tianle Xia, Wenwei Li, Ming Xu, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

TL;DR
AD-Bench is a real-world, trajectory-aware benchmark designed to evaluate LLM agents' performance in complex advertising and marketing tasks involving multi-round interactions and domain-specific tools.
Contribution
The paper introduces AD-Bench, a novel benchmark based on real-world advertising tasks with expert-verified answers and multi-tool trajectories, filling a gap in practical evaluation.
Findings
Gemini-3-Pro achieves Pass@1 = 68.0% on AD-Bench
Performance drops to Pass@1 = 49.4% on the most difficult level
Trajectory coverage is 70.1%, highlighting capability gaps in complex scenarios
Abstract
While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem. Current benchmarks, however, are largely restricted to idealized simulations, failing to address the practical demands of specialized domains like advertising and marketing analytics. In these fields, tasks are inherently more complex, often requiring multi-round interaction with professional marketing tools. To address this gap, we propose AD-Bench, a benchmark designed based on real-world business requirements of advertising and marketing platforms. AD-Bench is constructed from real user marketing analysis requests, with domain experts providing verifiable reference answers and corresponding reference tool-call trajectories. The benchmark categorizes requests into three difficulty levels (L1-L3) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques · Multimodal Machine Learning Applications
