A curated dataset for data-driven turbulence modelling
Ryley McConkey, Eugene Yee, Fue-Sang Lien

TL;DR
This paper introduces the first open-source, curated dataset for machine learning-based turbulence modelling, combining RANS, DNS, and LES data across various flow cases to facilitate model development and benchmarking.
Contribution
It provides a comprehensive, structured dataset for turbulence modelling that enables rapid development and testing of machine learning approaches, filling a critical resource gap.
Findings
Dataset includes 895,640 points with RANS features and DNS/LES labels.
Contains data from five flow cases with multiple turbulence models.
Available openly to accelerate turbulence model research.
Abstract
The recent surge in machine learning augmented turbulence modelling is a promising approach for addressing the limitations of Reynolds-averaged Navier-Stokes (RANS) models. This work presents the development of the first open-source dataset, curated and structured for immediate use in machine learning augmented turbulence closure modelling. The dataset features a variety of RANS simulations with matching direct numerical simulation (DNS) and large-eddy simulation (LES) data. Four turbulence models are selected to form the initial dataset: -, ---, -, and - SST. The dataset consists of 29 cases per turbulence model, for several parametrically sweeping reference DNS/LES cases: periodic hills, square duct, parametric bumps, converging-diverging channel, and a curved backward-facing step. At each of the 895,640 points, various…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
