Once Upon an Input: Reasoning via Per-Instance Program Synthesis
Adam Stein, Neelay Velingker, Mayur Naik, Eric Wong

TL;DR
This paper introduces Per-Instance Program Synthesis (PIPS), a novel method that enhances large language models' reasoning by generating and refining programs at the individual instance level, significantly improving accuracy and reducing errors.
Contribution
PIPS is a new approach that uses structural feedback to generate and refine reasoning programs without task-specific guidance, improving performance over existing methods.
Findings
PIPS improves accuracy by up to 9.4% over PoT and CoT.
Reduces undesirable program generations by 65.1%.
Effective across multiple LLMs and diverse benchmarks.
Abstract
Large language models (LLMs) excel at zero-shot inference but continue to struggle with complex, multi-step reasoning. Recent methods that augment LLMs with intermediate reasoning steps such as Chain of Thought (CoT) and Program of Thought (PoT) improve performance but often produce undesirable solutions, especially in algorithmic domains. We introduce Per-Instance Program Synthesis (PIPS), a method that generates and refines programs at the instance-level using structural feedback without relying on task-specific guidance or explicit test cases. To further improve performance, PIPS incorporates a confidence metric that dynamically chooses between direct inference and program synthesis on a per-instance basis. Experiments across three frontier LLMs and 30 benchmarks including all tasks of Big Bench Extra Hard (BBEH), visual question answering tasks, relational reasoning tasks, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
