HepScript: A Dual-Use DSL for Human-AI Collaborative Data Analysis Workflows in High-Energy Physics
Junkun Jiao, Tong Liu, Ke Li, Weimin Song, Yipu Liao, Bolun Zhang, Beijiang Liu, Chang-Zheng Yuan, Yue Sun

TL;DR
HepScript is a dual-use domain-specific language designed to facilitate human-AI collaboration in high-energy physics data analysis, significantly reducing human coding effort and enabling autonomous AI code generation.
Contribution
The paper introduces HepScript, a formal DSL for HEP workflows that bridges human expertise and AI automation, improving efficiency and scalability in scientific analysis.
Findings
HepScript reduces human-written code by 93%.
AI agents achieve 95% success in generating executable analysis specifications.
HepScript abstracts complex software stacks into a tractable, intuitive syntax.
Abstract
The escalating data scale in High-Energy Physics (HEP) fuels a growing aspiration for higher analytical efficiency. While Large Language Models (LLMs) offer a path toward automation via agentic AI, they struggle with complex scientific workflows that require deep domain knowledge and are tightly coupled to experiment-specific codebases. To address this, we introduce a methodology centered on HepScript, a dual-use Domain-Specific Language (DSL) for HEP data analysis workflows. HepScript serves as a shared formal interface, abstracting HEP analysis logic into a constrained syntax that is both intuitive for human experts and reliably generable by AI agents. First developed for the Beijing Spectrometer III (BESIII) experiment, HepScript hides the complexity of the underlying software stack, translating high-level analysis intent into low-level, production-ready code. In our case studies,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
