Learning to Reason via Program Generation, Emulation, and Search
Nathaniel Weir, Muhammad Khalifa, Linlu Qiu, Orion Weller, Peter Clark

TL;DR
This paper introduces CoGEX, a method that extends language models' program synthesis capabilities to complex reasoning tasks by generating pseudo-programs, emulating their execution, and searching for optimal solutions, significantly improving performance on diverse tasks.
Contribution
The paper presents CoGEX, a novel approach combining program generation, emulation, and search to enable language models to handle complex reasoning tasks beyond traditional code-based problems.
Findings
CoGEX outperforms standard in-context learning on various algorithmic and reasoning tasks.
Emulating program execution helps fill knowledge gaps in language models.
Program search yields more accurate solutions across diverse datasets.
Abstract
Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word concatenation). However, not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend an LM's program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (CoGEX). CoGEX works by (1) training LMs to generate pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using…
Peer Reviews
Decision·NeurIPS 2024 poster
- The proposed method appears to strike a nice balance between the rigidity of code-structured reasoning and freeform CoT, as it performs well across domains where CoT excels and ones where it struggles. - Being able to specialize reasoning for a particular task by selecting a pseudo-program is neat. It also appears to work well with a much smaller number of examples than are required to effectively fine-tune a model. - The authors do a very good job of proactively answering natural questions wi
1. The authors don't report statistical significance (e.g. through bootstrapping) or variance across runs with different subsamplings of training data. 2. As far as I can tell, the reported experiments don't really include domains where the correct solution can be reached by actually executing a fully specified program, besides Number Summing (accordingly, a program-of-thought baseline is also missing). In CoGEX the model is responsible for simulating program execution, so it seems likely that i
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTeaching and Learning Programming
MethodsSparse Evolutionary Training · + ( 1 ) ⟷ 888 ⟷ ( 829 ) ⟷ 0881||How do I resolve a dispute on Expedia?
