TL;DR
This paper introduces CodePDE, an inference framework that leverages large language models to generate PDE solvers, demonstrating promising performance and providing insights into their capabilities and limitations.
Contribution
It pioneers the use of LLMs for PDE solver generation through an inference framework and evaluates their reasoning, debugging, and self-refinement abilities.
Findings
LLMs can effectively generate PDE solvers with proper inference strategies
Trade-offs exist between solver reliability and complexity in LLM-generated solutions
Insights into failure modes and design principles for LLM-driven PDE solving agents
Abstract
Partial differential equations (PDEs) are fundamental to modeling physical systems, yet solving them remains a complex challenge. Traditional numerical solvers rely on expert knowledge to implement and are computationally expensive, while neural-network-based solvers require large training datasets and often lack interpretability. In this work, we frame PDE solving as a code generation task and introduce CodePDE, the first inference framework for generating PDE solvers using large language models (LLMs). With CodePDE, we present a thorough evaluation on critical capacities of LLM for PDE solving: reasoning, debugging, self-refinement, and test-time scaling. CodePDE shows that, with advanced inference-time algorithms and scaling strategies, LLMs can achieve strong performance across a range of representative PDE problems. We also identify novel insights into LLM-driven solver generation,…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
+ **Pioneering Exploration**: The work courageously explores a novel paradigm of using LLMs for numerical code generation, opening up new research directions in AI-enabled scientific computing. + **Comprehensive Evaluation**: Extensive benchmarking across 16 LLMs and 5 PDE families provides valuable data for the community. + **Practical Framework**: The debugging and refinement mechanisms demonstrate real utility for improving code generation reliability in scientific contexts. + **Insightful An
+ **Limited Forward-Looking Insight**: While empirically thorough, the paper misses an opportunity to deeply discuss how this LLM-driven approach might evolve to complement rather than replace traditional numerical methods, and what unique advantages the fusion might bring. + **Practical Deployment Concerns**: The generated solvers, while accurate, lack the performance optimizations and battle-testing of established numerical libraries, limiting immediate practical utility. + **Motivation Gap**:
1. Comprehensive & Rigorous Experiments: A major strength is the experimental breadth. The authors test 16 LLMs on 5 PDE benchmarks and compare them comprehensively against numerical solvers, specialized software, multiple neural solvers (FNO, PINN, etc.), and other agentic workflows. 2. Deep Insight on Interpretability: The failure-mode analysis for the Reaction-Diffusion equation in Section 5.7 is excellent. It perfectly illustrates the core advantage of this method over "black-box" neural sol
1. Dimensionality Limitation: The evaluation is focused on 1D and 2D PDEs. The true challenge for PDE solvers (the "curse of dimensionality") lies in high-dimensional problems. It is unclear how this framework would scale, as the complexity of the solver code (e.g., 3D FDM stencils) would increase dramatically. 2. Practicality of Refinement Signal: Step 5 (Solver Refinement) relies on nRMSE as the feedback signal, which requires a ground-truth solution. In real-world scientific discovery, the en
* LLM abilities are constantly improving. This paper presents an API to leverage LLMs by framing PDE solving as a code generation task, enabling LLMs to produce solver code directly from natural language. * CodePDE integrates task specification, code generation, debugging, evaluation, and refinement in a structured pipeline. * The evaluating is systematic. The paper analyses 16 LLMs across five PDE families using metrics like nRMSE, convergence rate, and execution time. The paper clearly prese
> Takeaways. LLMs can improve code for better accuracy using simple performance feedback. Interestingly, the best models at generating code are not always the best at refining it, suggesting these are two different skills. * (L356 above): Perhaps some percentage improvements in table 1 between " CodePDE: Reasoning + Debugging (best of 32)" and "CodePDE: Reasoning + Debugging + Refinement (best of 12)" would better help justify this claim, where the reasoning is currently unclear. * Some mixed r
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
