Steering Large Language Models between Code Execution and Textual   Reasoning

Yongchao Chen; Harsh Jhamtani; Srinagesh Sharma; Chuchu Fan; Chi Wang

arXiv:2410.03524·cs.CL·March 4, 2025

Steering Large Language Models between Code Execution and Textual Reasoning

Yongchao Chen, Harsh Jhamtani, Srinagesh Sharma, Chuchu Fan, Chi Wang

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

This paper investigates how to effectively steer large language models between code execution and textual reasoning, revealing patterns, limitations, and proposing methods to improve task-solving efficiency across various models and tasks.

Contribution

It introduces three novel methods for better steering LLMs between code and text generation, addressing current limitations and improving performance across multiple tasks and models.

Findings

01

Models use code or text differently depending on task complexity and size.

02

Results from LLM-generated code are not always superior to textual reasoning.

03

Proposed methods significantly improve steering accuracy and efficiency.

Abstract

While a lot of recent research focuses on enhancing the textual reasoning capabilities of Large Language Models (LLMs) by optimizing the multi-agent framework or reasoning chains, several benchmark tasks can be solved with 100\% success through direct coding, which is more scalable and avoids the computational overhead associated with textual iterating and searching. Textual reasoning has inherent limitations in solving tasks with challenges in math, logics, optimization, and searching, which is unlikely to be solved by simply scaling up the model and data size. The recently released OpenAI GPT Code Interpreter and multi-agent frameworks such as AutoGen have demonstrated remarkable proficiency of integrating code generation and execution to solve complex tasks using LLMs. However, based on our experiments on 7 existing popular methods for steering code/text generation in both single-…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yongchao98/codesteer-v1.0
pytorch

Models

🤗
yongchao98/CodeSteer-v1
model· 17 dl· ♡ 8
17 dl♡ 8

Datasets

yongchao98/SymBench
dataset· 1.1k dl
1.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Reliability and Analysis Research · Software Testing and Debugging Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Residual Connection · Weight Decay · Cosine Annealing · Dropout · Byte Pair Encoding · Softmax · Attention Dropout