The IAEA Fusion Data Lake Project -- Accelerating AI and Big Data Applications through Open Science and FAIR Data
Daljeet Singh Gahle, Matteo Barbarino

TL;DR
The IAEA Fusion Data Lake Project aims to enhance AI and big data applications in fusion research by creating an open, FAIR-compliant data infrastructure with catalogues, storage, and federation to improve data accessibility.
Contribution
This work introduces a modern data platform supporting AI workflows in fusion research, integrating international data catalogues and demonstrating scalability through a proof of concept.
Findings
Successful integration with UKAEA's MAST Data Catalog
Demonstrated data federation capacity
Planned scalability with additional experimental catalogues
Abstract
AI applications in fusion is a maturing field, playing a key role as surrogate models and digital twins to overcome computational expense limitations and insufficiently characterised phenomena, and expanding the horizon for real-time applications. The IAEA is supporting this activity through the AI for Fusion Coordinated Research Project (CRP), a five-year initiative launched in 2022, which involves 24 institutions across 11 countries. A key goal is to support the development of modern data infrastructure required to enable the development of agnostic AI models that can be safely extrapolate into the parameter space of future fusion power plants. The IAEA is playing an active role in contributing to the data infrastructure with the Fusion Data Lake project. A modern data platform to enable the development of AI workflows in line with FAIR data principles. The platform comprises three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
