Scaling Textual Gradients via Sampling-Based Momentum

Zixin Ding; Junyuan Hong; Zhan Shi; Jiachen T. Wang; Zinan Lin; Li Yin; Meng Liu; Zhangyang Wang; Yuxin Chen

arXiv:2506.00400·cs.CL·November 19, 2025

Scaling Textual Gradients via Sampling-Based Momentum

Zixin Ding, Junyuan Hong, Zhan Shi, Jiachen T. Wang, Zinan Lin, Li Yin, Meng Liu, Zhangyang Wang, Yuxin Chen

PDF

Open Access

TL;DR

This paper introduces TSGD-M, a sampling-based momentum method for scaling textual gradient optimization in prompt engineering, addressing challenges of data scaling and stability in large language model prompts.

Contribution

It proposes a novel sampling-based momentum approach with Gumbel-Top-k sampling, improving scalability and stability in textual gradient descent for prompt optimization.

Findings

01

TSGD-M achieves consistent improvements across 5 benchmarks.

02

Gumbel-Top-k sampling balances exploration and exploitation effectively.

03

The method integrates seamlessly with existing prompt optimization frameworks.

Abstract

LLM-based prompt optimization, that uses LLM-provided "textual gradients" (feedback) to refine prompts, has emerged an effective method for automatic prompt engineering. However, its scalability and stability are unclear when using more data in training. We systematically investigate the potential and challenges of scaling training data in textual gradient descent. We show that naively scaling training examples is infeasible due to both explicit context-length limits and an implicit context wall, where long-context degradation yields diminishing returns. Inspired by prior wisdom in stochastic gradient descent, we propose Textual Stochastic Gradient Descent with Momentum (TSGD-M), which reweights updates through momentum sampling, using bootstrapped minibatch validation accuracy as importance weights over historical prompts. We introduce Gumbel-Top- $k$ sampling for prompt generation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis