Certified Deductive Reasoning with Language Models

Gabriel Poesia; Kanishk Gandhi; Eric Zelikman; Noah D. Goodman

arXiv:2306.04031·cs.AI·November 9, 2023·2 cites

Certified Deductive Reasoning with Language Models

Gabriel Poesia, Kanishk Gandhi, Eric Zelikman, Noah D. Goodman

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces LogicGuide, a tool that guides language models to produce logically sound reasoning traces, significantly improving accuracy and reducing errors in complex reasoning tasks.

Contribution

The paper presents LogicGuide, a novel framework for guiding language models with formal reasoning tools to ensure soundness and enable self-improvement.

Findings

01

LogicGuide improves GPT-3, GPT-3.5 Turbo, and LLaMA accuracy by up to 35%.

02

It drastically reduces content effects and reasoning errors.

03

Self-bootstrapping with LogicGuide enhances performance on real-world datasets.

Abstract

Language models often achieve higher accuracy when reasoning step-by-step in complex tasks. However, even when arriving at a correct final answer, their rationales are often logically unsound or inconsistent. This is a major issue when reliable reasoning traces are needed, such when fine-tuning on model-generated reasoning for self-improvement. To tackle these issues, we introduce a class of tools for language models called \emph{guides}, that use state and incremental constraints to guide generation. A guide can be invoked by the model to constrain its own generation to a set of valid statements given by the tool. In turn, the model's choices can change the guide's state. We show how a general system for logical reasoning can be used as a guide, which we call \textsc{LogicGuide}. Given a reasoning problem in natural language, a model can formalize its assumptions for…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The paper introduces a novel logical guidance framework designed to aid LLMs in performing logical inference. The method employs the most general form of deductive reasoning, making it versatile across a range of reasoning scenarios. 2. Experiments across multiple datasets validate that LogicGuide enhances the performance of language models. The paper also provides specific examples demonstrating its efficacy in mitigating the impact of unwarranted prior assumptions and performing self-learni

Weaknesses

1. The proposed method necessitates a reliance on a complex formalization process during training and inference. 2. The scenarios considered in the paper seem a bit limited. Despite experimenting on diverse datasets, the nature of problems within them appear quite similar. In more generalized contexts, it might be challenging to formalize and identify corresponding actions, such as `objects`, `relations`, etc. 3. The paper's primary contribution, namely, how to harness logic to ensure output c

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- Logical reasoning (or more generally multi-hop reaosning) in natural language with LLMs is an important area of research. - Showing results both for prompting and finetuning. - The writing was mostly clear and easy to follow. - The reported improvements for ReClor could be quite encouraging.

Weaknesses

- Most of the experiments are done on the ProofWriter and the PrOntoQA datasets. Both these datasets have been constructed by turning logical theories into natural language using very simple templates. This is especially true for the PrOntoQA dataset where each sentence is of the format "X is Y" which is simply equivalent to (X, is, Y) in the triple notation. For this reason, while these datasets are appropriate benchmarks for measuring the general reasoning capacity of off-the-shelve LLMs, I do

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

The high level idea seems good (but the details I'm no so clear about). The results are very good.

Weaknesses

The main problem with this that the details of the architecture isn't clear. Here is what I understand: The LLM gets language (The "Context" in the figures). The LLM generates a "formalized context" that can be used as the input to Peano. Peano implements a guide function, and outputs a set of valid one-step conclusions. This is input back into the LLM by biasing the logits (whatever that means), then presumably the LLM does sometime else to generate the next formalized contexts to do the next s

Code & Models

Repositories

gpoesia/certified-reasoning
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Dropout · Residual Connection · Linear Layer · Adam · Attention Dropout