Can LLMs Reason in the Wild with Programs?

Yuan Yang; Siheng Xiong; Ali Payani; Ehsan Shareghi; Faramarz Fekri

arXiv:2406.13764·cs.CL·June 21, 2024

Can LLMs Reason in the Wild with Programs?

Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, Faramarz Fekri

PDF

Open Access 1 Repo 3 Models 1 Datasets 1 Video

TL;DR

This paper introduces the challenging task of reasoning in the wild for LLMs, emphasizing their limitations in ambiguous, real-world problems, and demonstrates how fine-tuning can improve their reasoning capabilities.

Contribution

It defines a new realistic reasoning task, creates a diverse dataset, and evaluates LLMs' performance, highlighting limitations and potential improvements through fine-tuning.

Findings

01

LLMs struggle with ambiguous and hybrid reasoning problems.

02

Performance drops significantly on complex reasoning tasks.

03

Fine-tuning improves LLM reasoning accuracy.

Abstract

Large Language Models (LLMs) have shown superior capability to solve reasoning problems with programs. While being a promising direction, most of such frameworks are trained and evaluated in settings with a prior knowledge of task requirements. However, as LLMs become more capable, it is necessary to assess their reasoning abilities in more realistic scenarios where many real-world problems are open-ended with ambiguous scope, and often require multiple formalisms to solve. To investigate this, we introduce the task of reasoning in the wild, where an LLM is tasked to solve a reasoning problem of unknown type by identifying the subproblems and their corresponding formalisms, and writing a program to solve each subproblem, guided by a tactic. We create a large tactic-guided trajectory dataset containing detailed solutions to a diverse set of reasoning problems, ranging from well-defined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gblackout/reason-in-the-wild
noneOfficial

Models

Datasets

yuan-yang/ReWild
dataset· 4 dl
4 dl

Videos

Can LLMs Reason in the Wild with Programs?· underline

Taxonomy

TopicsCooperative Studies and Economics

MethodsSparse Evolutionary Training