Integrating Large Language Models and Reinforcement Learning for   Non-Linear Reasoning

Yoav Alon; Cristina David

arXiv:2410.13501·cs.LG·October 18, 2024

Integrating Large Language Models and Reinforcement Learning for Non-Linear Reasoning

Yoav Alon, Cristina David

PDF

Open Access

TL;DR

This paper introduces a hybrid architecture combining Large Language Models with Reinforcement Learning to enhance non-linear reasoning and long-term planning, showing improved performance on program equivalence tasks.

Contribution

The novel integration of RL-guided exploration with LLMs enables better long-term reasoning and decision-making, surpassing existing methods like CoT and ToT.

Findings

01

Outperforms Chain of Thought and Tree of Thoughts methods

02

Enables non-linear reasoning with backtracking capabilities

03

Improves accuracy on program equivalence classification

Abstract

Large Language Models (LLMs) were shown to struggle with long-term planning, which may be caused by the limited way in which they explore the space of possible solutions. We propose an architecture where a Reinforcement Learning (RL) Agent guides an LLM's space exploration: (1) the Agent has access to domain-specific information, and can therefore make decisions about the quality of candidate solutions based on specific and relevant metrics, which were not explicitly considered by the LLM's training objective; (2) the LLM can focus on generating immediate next steps, without the need for long-term planning. We allow non-linear reasoning by exploring alternative paths and backtracking. We evaluate this architecture on the program equivalence task, and compare it against Chain of Thought (CoT) and Tree of Thoughts (ToT). We assess both the downstream task, denoting the binary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Natural Language Processing Techniques

MethodsFocus