Learning to Reason via Program Generation, Emulation, and Search

Nathaniel Weir; Muhammad Khalifa; Linlu Qiu; Orion Weller; Peter Clark

arXiv:2405.16337·cs.CL·November 5, 2024

Learning to Reason via Program Generation, Emulation, and Search

Nathaniel Weir, Muhammad Khalifa, Linlu Qiu, Orion Weller, Peter Clark

PDF

Open Access 1 Repo 1 Datasets 1 Reviews

TL;DR

This paper introduces CoGEX, a method that extends language models' program synthesis capabilities to complex reasoning tasks by generating pseudo-programs, emulating their execution, and searching for optimal solutions, significantly improving performance on diverse tasks.

Contribution

The paper presents CoGEX, a novel approach combining program generation, emulation, and search to enable language models to handle complex reasoning tasks beyond traditional code-based problems.

Findings

01

CoGEX outperforms standard in-context learning on various algorithmic and reasoning tasks.

02

Emulating program execution helps fill knowledge gaps in language models.

03

Program search yields more accurate solutions across diverse datasets.

Abstract

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word concatenation). However, not all reasoning tasks are easily expressible as code, e.g. tasks involving commonsense reasoning, moral decision-making, and sarcasm understanding. Our goal is to extend an LM's program synthesis skills to such tasks and evaluate the results via pseudo-programs, namely Python programs where some leaf function calls are left undefined. To that end, we propose, Code Generation and Emulated EXecution (CoGEX). CoGEX works by (1) training LMs to generate pseudo-programs, (2) teaching them to emulate their generated program's execution, including those leaf functions, allowing the LM's knowledge to fill in the execution gaps; and (3) using…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 7Confidence 4

Strengths

- The proposed method appears to strike a nice balance between the rigidity of code-structured reasoning and freeform CoT, as it performs well across domains where CoT excels and ones where it struggles. - Being able to specialize reasoning for a particular task by selecting a pseudo-program is neat. It also appears to work well with a much smaller number of examples than are required to effectively fine-tune a model. - The authors do a very good job of proactively answering natural questions wi

Weaknesses

1. The authors don't report statistical significance (e.g. through bootstrapping) or variance across runs with different subsamplings of training data. 2. As far as I can tell, the reported experiments don't really include domains where the correct solution can be reached by actually executing a fully specified program, besides Number Summing (accordingly, a program-of-thought baseline is also missing). In CoGEX the model is responsible for simulating program execution, so it seems likely that i

Code & Models

Repositories

nweir127/cogex
noneOfficial

Datasets

mkhalifa/CoGEX
dataset· 25 dl
25 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming

MethodsSparse Evolutionary Training · + ( 1 ) ⟷ 888 ⟷ ( 829 ) ⟷ 0881||How do I resolve a dispute on Expedia?