Feedback Descent: Open-Ended Text Optimization via Pairwise Comparison
Yoonho Lee, Joseph Boen, Chelsea Finn

TL;DR
Feedback Descent introduces a novel text optimization framework that leverages detailed structured feedback instead of scalar rewards, enabling more effective and directed optimization of prompts, code, and molecules without modifying model weights.
Contribution
It presents a new method for text artifact optimization using structured textual feedback and in-context learning, outperforming existing methods across multiple domains.
Findings
Outperforms state-of-the-art prompt optimization methods.
Achieves superior results in molecular discovery benchmarks.
Enables targeted edits through in-context learning with structured feedback.
Abstract
We introduce \textit{Feedback Descent}, a framework that optimizes text artifacts -- prompts, code, and molecules -- through structured textual feedback, rather than relying solely on scalar rewards. By preserving detailed critiques instead of compressing them to binary preferences, Feedback Descent widens the information bottleneck in preference learning, enabling directed optimization in text space rather than weight space. We show that in-context learning can transform structured feedback into gradient-like directional information, enabling targeted edits. Unlike prior approaches that collapse judgments into single bits, our evaluators pair each comparison with textual feedback, which functions as high-bandwidth supervision. The iteration loop is done purely at inference time, without modifying any model weights, and is task-agnostic. We evaluate Feedback Descent on three diverse…
Peer Reviews
Decision·ICLR 2026 Conference Desk Rejected Submission
- The paper correctly identifies that text information contains a lot more information than binary preferences - Good choice of experiments and baselines, method seems effective - Writing generally clear and figures look nice!
- **Scientific question incoherent / inconsistent**: in §1-§4, the paper sets up the scientific question of the paper as "does textual feedback provide a stronger learning signal than binary feedback". Then, §5 seems to answer the question "is direct feedback optimization an effective way to accomplish these tasks". _This is not the original question!!_ In my opinion, the "missing" baseline in each case is the same LLM-based optimizer with the same binary feedback, but _without_ the explanatory
1. The paper is generally well-written with clear motivation and contributions. 2. Algorithm 1 is simple, self-contained, and reproducible. 3. Inference-time optimization with no weight updates is valuable for practitioners. (Minor note: "SVG" should be expanded to Scalable Vector Graphics at first mention for accessibility.) 4. The framework is validated on three qualitatively different tasks: - Visual design (SVG), - Natural language (prompts), and - Chemistry (molecules). 5
1. The authors claim that "a paragraph of feedback contains more Shannon information than a single scalar or bit," which is trivially true in terms of raw bits. However, information content does not equal actionable directional information. Several critical questions remain unaddressed. 2. **Quantitative validation missing:** Can the authors quantify whether textual feedback actually provides gradient-aligned directions? For example, measure the correlation between feedback-suggested changes an
The method addresses a real limitation in preference learning by incorporating rich feedback rather than collapsing supervision into binary signals. It is model-agnostic and requires no parameter updates, making it widely applicable. Experiments span distinct domains and show improvements over well-established baselines. The use of textual rationales to guide generation is intuitive and aligns with emerging trends in LLM usage. The SVG and molecular tasks show the advantages of iterative feedba
1. The framework heavily depends on high-quality evaluators that can provide meaningful and consistent textual rationales. If the feedback is noisy, vague, or inconsistent, the system may stagnate or regress. 2. The update step only keeps one best candidate per iteration, which may limit diversity and exploration. 3. The paper lacks ablation studies showing how performance changes when textual feedback is removed or corrupted. 4. although the method is said to be domain-general, all tasks
The authors identify an important area of research and apply it to three different domains where they show the performance of the proposed approach is competitive. Exploiting the text space to provide feedback is an interesting idea. It can be thought of as analogous to gradient descent when the direction of text feedback aligns with the direction of gradients. Even if the latest feedback is not useful as the authors show the accumulation of past feedback and why it failed helps the model to ref
As the authors mention in the limitations section, the model will rely on strong evaluators which can be challenging in certain domains for e.g. in designing antibodies. If there is a way to combine multiple feedbacks such as use an ensemble of feedback from text and latent space and then guide the model it might make the approach robust. And second strictly following either feedback might also be limiting and there is definitely scope to be creative on "how to use the feedback".
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Machine Learning in Materials Science · Advanced Graph Neural Networks
