A data management system for machine learning research of tokamak
Chenguang Wan, Zhi Yu, Xiaojuan Liu, Xinghao Wen, Xi Deng, and, Jiangang Li

TL;DR
This paper introduces a new data management system based on MongoDB and HDF5, optimized for machine learning research on tokamak data, addressing limitations of traditional databases like MDSplus.
Contribution
The authors developed a specialized data management system for tokamak ML research that improves data access, reliability, and training efficiency, complementing existing MDSplus infrastructure.
Findings
Supports over 3000 data channels with reliable concurrent access
Includes functions like error correction and data conversion for ML readiness
Accelerates ML model training with specialized data handling features
Abstract
In recent years, machine learning (ML) research methods have received increasing attention in the tokamak community. The conventional database (i.e., MDSplus for tokamak) of experimental data has been designed for small group consumption and is mainly aimed at simultaneous visualization of a small amount of data. The ML data access patterns fundamentally differ from traditional data access patterns. The typical MDSplus database is increasingly showing its limitations. We developed a new data management system suitable for tokamak machine learning research based on Experimental Advanced Superconducting Tokamak (EAST) data. The data management system is based on MongoDB and Hierarchical Data Format version 5 (HDF5). Currently, the entire data management has more than 3000 channels of data. The system can provide highly reliable concurrent access. The system includes error correction,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Big Data Technologies and Applications
