AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents

Lingxiang Hu; Yiding Sun; Tianle Xia; Wenwei Li; Ming Xu; Liqun Liu; Peng Shu; Huan Yu; Jie Jiang

arXiv:2602.14257·cs.CL·February 17, 2026

AD-Bench: A Real-World, Trajectory-Aware Advertising Analytics Benchmark for LLM Agents

Lingxiang Hu, Yiding Sun, Tianle Xia, Wenwei Li, Ming Xu, Liqun Liu, Peng Shu, Huan Yu, Jie Jiang

PDF

Open Access

TL;DR

AD-Bench is a real-world, trajectory-aware benchmark designed to evaluate LLM agents' performance in complex advertising and marketing tasks involving multi-round interactions and domain-specific tools.

Contribution

The paper introduces AD-Bench, a novel benchmark based on real-world advertising tasks with expert-verified answers and multi-tool trajectories, filling a gap in practical evaluation.

Findings

01

Gemini-3-Pro achieves Pass@1 = 68.0% on AD-Bench

02

Performance drops to Pass@1 = 49.4% on the most difficult level

03

Trajectory coverage is 70.1%, highlighting capability gaps in complex scenarios

Abstract

While Large Language Model (LLM) agents have achieved remarkable progress in complex reasoning tasks, evaluating their performance in real-world environments has become a critical problem. Current benchmarks, however, are largely restricted to idealized simulations, failing to address the practical demands of specialized domains like advertising and marketing analytics. In these fields, tasks are inherently more complex, often requiring multi-round interaction with professional marketing tools. To address this gap, we propose AD-Bench, a benchmark designed based on real-world business requirements of advertising and marketing platforms. AD-Bench is constructed from real user marketing analysis requests, with domain experts providing verifiable reference answers and corresponding reference tool-call trajectories. The benchmark categorizes requests into three difficulty levels (L1-L3) to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Recommender Systems and Techniques · Multimodal Machine Learning Applications