TL;DR
FD-Bench introduces a comprehensive, modular, and fair benchmark for evaluating data-driven fluid simulation models, addressing reproducibility and comparability issues in the field.
Contribution
It provides a unified framework with standardized evaluation protocols, enabling fair comparison of models and traditional solvers, and offers an extensible codebase for future research.
Findings
Evaluated 85 baseline models across 10 flow scenarios.
Established a comprehensive leaderboard for data-driven fluid models.
Analyzed generalization across resolutions, initial conditions, and time windows.
Abstract
Data-driven modeling of fluid dynamics has advanced rapidly with neural PDE solvers, yet a fair and strong benchmark remains fragmented due to the absence of unified PDE datasets and standardized evaluation protocols. Although architectural innovations are abundant, fair assessment is further impeded by the lack of clear disentanglement between spatial, temporal and loss modules. In this paper, we introduce FD-Bench, the first fair, modular, comprehensive and reproducible benchmark for data-driven fluid simulation. FD-Bench systematically evaluates 85 baseline models across 10 representative flow scenarios under a unified experimental setup. It provides four key contributions: (1) a modular design enabling fair comparisons across spatial, temporal, and loss function modules; (2) the first systematic framework for direct comparison with traditional numerical solvers; (3) fine-grained…
Peer Reviews
Decision·Submitted to ICLR 2026
1.This paper accurately identifies the core pain point of "fragmented evaluation" in the current field of neural PDE solvers, and the proposal of FD-Bench has clear practical significance. 2.This paper unifies and abstracts various existing methods into a quadruple of "spatial encoding + temporal encoding + loss function + tricks", and designs controlled variable experiments based on this to achieve "structure-performance" decoupling. 3.This paper provides an easy-to-use and reproducible codebas
1.This paper emphasizes comparisons across 89 baselines. However, Table 3 appears to extract components from existing methods, categorize them, and then compare the methods after reorganizing the components, which doesn't seem entirely consistent with the description. 2.FD-Bench’s primary metrics are RMSE, nRMSE, and frequency‑domain RMSE. These capture “fit” but do not directly assess physical consistency.
The modularization of neural PDE solvers into spatial/temporal/loss components, along with benchmarking controlled cross-comparisons of these choices, is a useful organizing principle that helps attribute where gains come from, rather than treating methods as monoliths. The inclusion of direct comparisons to classical solvers at coarser grids and lower-order schemes (with matched error targets) is a thoughtful step toward practicality. The benchmark scope is broad (reported 89 baselines, 10 sce
*Novelty of questions vs. synthesis*: The three headline questions (which neural architecture; can neural replace numerical; which discretization; how well do models generalize) are useful but not conceptually new; several prior surveys/benchmarks have articulated similar axes. FD-Bench’s novelty is primarily scope and modular rigor, not new task formulations. (This aligns with the authors’ own positioning against prior benchmarks in Table 1.) *Insights largely confirm established intuitions*:
- The modular design is a step toward disentangling the contributions of different components in neural PDE solvers. - The effort to unify and standardize evaluation across a large number of models is commendable. - The inclusion of traditional numerical solvers as baselines is a valuable addition. - The codebase is publicly released, which may facilitate future research.
- Lack of Theoretical or Algorithmic Novelty: The paper presents a benchmarking effort rather than a methodological advance. While useful, it does not introduce new models, theoretical insights, or algorithmic improvements. The decomposition into spatial/temporal/loss modules is intuitive but not novel, and the paper does not justify why this particular decomposition is the most meaningful or complete. - Superficial Model Decomposition: The decomposition of 89 models into modular components is
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Real-time simulation and control systems · Lattice Boltzmann Simulation Studies
