o1-Coder: an o1 Replication for Coding
Yuxiang Zhang, Shangxi Wu, Yuqi Yang, Jiangming Shu, Jinlin Xiao, Chao, Kong, Jitao Sang

TL;DR
This paper presents O1-CODER, a replication of OpenAI's o1 model focused on coding, integrating reinforcement learning and Monte Carlo Tree Search to improve systematic reasoning in code generation.
Contribution
It introduces a framework combining RL and MCTS for coding tasks, including a test case generator and iterative fine-tuning, addressing deployment challenges of o1-like models.
Findings
Initial results show promise in code reasoning capabilities.
Framework facilitates transition to System-2 thinking in AI models.
Open-source code and datasets are provided for further research.
Abstract
The technical report introduces O1-CODER, an attempt to replicate OpenAI's o1 model with a focus on coding tasks. It integrates reinforcement learning (RL) and Monte Carlo Tree Search (MCTS) to enhance the model's System-2 thinking capabilities. The framework includes training a Test Case Generator (TCG) for standardized code testing, using MCTS to generate code data with reasoning processes, and iteratively fine-tuning the policy model to initially produce pseudocode and then generate the full code. The report also addresses the opportunities and challenges in deploying o1-like models in real-world applications, suggesting transitioning to the System-2 paradigm and highlighting the imperative for world model construction. Updated model progress and experimental results will be reported in subsequent versions. All source code, curated datasets, as well as the derived models are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputability, Logic, AI Algorithms · Algorithms and Data Compression · Advanced Data Compression Techniques
MethodsFocus
