DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

Mingxuan Song; Yusen Huo; Bohan Zhou; Shenglin Yin; Zhen Xiao; Jieyi Long; Zhilin Zhang; Chuan Yu

arXiv:2601.14711·cs.AI·January 22, 2026

DARA: Few-shot Budget Allocation in Online Advertising via In-Context Decision Making with RL-Finetuned LLMs

Mingxuan Song, Yusen Huo, Bohan Zhou, Shenglin Yin, Zhen Xiao, Jieyi Long, Zhilin Zhang, Chuan Yu

PDF

Open Access

TL;DR

This paper introduces DARA, a dual-phase framework utilizing RL-finetuned LLMs for few-shot budget allocation in online advertising, combining in-context reasoning with precise optimization to improve advertiser value.

Contribution

The paper proposes DARA, a novel two-stage decision-making framework that leverages RL-finetuned LLMs for effective few-shot budget allocation in online advertising.

Findings

01

DARA outperforms existing baselines in real-world and synthetic experiments.

02

The dual-phase approach effectively combines reasoning and optimization.

03

RL-finetuned LLMs enhance decision quality in data-scarce scenarios.

Abstract

Optimizing the advertiser's cumulative value of winning impressions under budget constraints poses a complex challenge in online advertising, under the paradigm of AI-Generated Bidding (AIGB). Advertisers often have personalized objectives but limited historical interaction data, resulting in few-shot scenarios where traditional reinforcement learning (RL) methods struggle to perform effectively. Large Language Models (LLMs) offer a promising alternative for AIGB by leveraging their in-context learning capabilities to generalize from limited data. However, they lack the numerical precision required for fine-grained optimization. To address this limitation, we introduce GRPO-Adaptive, an efficient LLM post-training strategy that enhances both reasoning and numerical precision by dynamically updating the reference policy during training. Built upon this foundation, we further propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques · Consumer Market Behavior and Pricing · Advanced Bandit Algorithms Research