Can LLMs Reason in the Wild with Programs?
Yuan Yang, Siheng Xiong, Ali Payani, Ehsan Shareghi, Faramarz Fekri

TL;DR
This paper introduces the challenging task of reasoning in the wild for LLMs, emphasizing their limitations in ambiguous, real-world problems, and demonstrates how fine-tuning can improve their reasoning capabilities.
Contribution
It defines a new realistic reasoning task, creates a diverse dataset, and evaluates LLMs' performance, highlighting limitations and potential improvements through fine-tuning.
Findings
LLMs struggle with ambiguous and hybrid reasoning problems.
Performance drops significantly on complex reasoning tasks.
Fine-tuning improves LLM reasoning accuracy.
Abstract
Large Language Models (LLMs) have shown superior capability to solve reasoning problems with programs. While being a promising direction, most of such frameworks are trained and evaluated in settings with a prior knowledge of task requirements. However, as LLMs become more capable, it is necessary to assess their reasoning abilities in more realistic scenarios where many real-world problems are open-ended with ambiguous scope, and often require multiple formalisms to solve. To investigate this, we introduce the task of reasoning in the wild, where an LLM is tasked to solve a reasoning problem of unknown type by identifying the subproblems and their corresponding formalisms, and writing a program to solve each subproblem, guided by a tactic. We create a large tactic-guided trajectory dataset containing detailed solutions to a diverse set of reasoning problems, ranging from well-defined…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsCooperative Studies and Economics
MethodsSparse Evolutionary Training
