Real time data access log analysis system of EAST tokamak based on spark
F. Wang, Q.H. Zhang, X.Y. Sun, Y. Chen, Y.T. Wang, F. Yang

TL;DR
This paper presents a real-time log analysis system for the EAST tokamak's MDSplus data server using Spark Streaming, enabling efficient processing and visualization of large-scale log data for fusion experiment management.
Contribution
It introduces a novel real-time log analysis system based on Spark Streaming, integrating log monitoring, aggregation, and distribution technologies for large-scale data processing in fusion experiments.
Findings
System processes tens of millions of logs per second
Demonstrates steady and reliable performance
Provides valuable data management insights
Abstract
The experiment data generated by the EAST device is getting larger and larger, and it is necessary to monitor the MDSplus data storage server on EAST. In order to facilitate the management of users on the MDSplus server, a real-time monitoring log analysis system is needed. The data processing framework adopted by this log analysis system is the Spark Streaming framework in Spark ecosphere, whose real-time streaming data is derived from MDSplus logs. The framework also makes use of key technologies such as log monitoring, aggregation and distribution with framework likes Flume and Kafka which makes it possible for MDSplus mass log data processing power. The system can process tens of millions of unprocessed MDSplus log information at a second level, then model the log information and display it on the web. This report introduces the design and implementation of the overall architecture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic confinement fusion research · Advanced Data Storage Technologies
