Once Upon an Input: Reasoning via Per-Instance Program Synthesis

Adam Stein; Neelay Velingker; Mayur Naik; Eric Wong

arXiv:2510.22849·cs.CL·October 28, 2025

Once Upon an Input: Reasoning via Per-Instance Program Synthesis

Adam Stein, Neelay Velingker, Mayur Naik, Eric Wong

PDF

TL;DR

This paper introduces Per-Instance Program Synthesis (PIPS), a novel method that enhances large language models' reasoning by generating and refining programs at the individual instance level, significantly improving accuracy and reducing errors.

Contribution

PIPS is a new approach that uses structural feedback to generate and refine reasoning programs without task-specific guidance, improving performance over existing methods.

Findings

01

PIPS improves accuracy by up to 9.4% over PoT and CoT.

02

Reduces undesirable program generations by 65.1%.

03

Effective across multiple LLMs and diverse benchmarks.

Abstract

Large language models (LLMs) excel at zero-shot inference but continue to struggle with complex, multi-step reasoning. Recent methods that augment LLMs with intermediate reasoning steps such as Chain of Thought (CoT) and Program of Thought (PoT) improve performance but often produce undesirable solutions, especially in algorithmic domains. We introduce Per-Instance Program Synthesis (PIPS), a method that generates and refines programs at the instance-level using structural feedback without relying on task-specific guidance or explicit test cases. To further improve performance, PIPS incorporates a confidence metric that dynamically chooses between direct inference and program synthesis on a per-instance basis. Experiments across three frontier LLMs and 30 benchmarks including all tasks of Big Bench Extra Hard (BBEH), visual question answering tasks, relational reasoning tasks, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.