Evaluating Query Languages and Systems for High-Energy Physics Data [Extended Version]
Dan Graur (1), Ingo M\"uller (1), Mason Proffitt (2) and, Ghislain Fourny (1), Gordon T. Watts (2), Gustavo Alonso (1) ((1), Department of Computer Science, ETH Zurich, (2) Department of Physics,, University of Washington)

TL;DR
This paper evaluates six data processing platforms for high-energy physics, revealing significant gaps in naturalness, conciseness, and performance of general-purpose query systems compared to domain-specific solutions.
Contribution
It provides a comprehensive analysis of existing platforms' suitability for HEP data analysis, highlighting areas for improvement in query language expressiveness and efficiency.
Findings
Most platforms are slower than domain-specific systems by one to two orders of magnitude.
Query languages vary greatly in expressing HEP query patterns.
Significant work is needed to adapt database systems for HEP research.
Abstract
In the domain of high-energy physics (HEP), query languages in general and SQL in particular have found limited acceptance. This is surprising since HEP data analysis matches the SQL model well: the data is fully structured and queried using mostly standard operators. To gain insights on why this is the case, we perform a comprehensive analysis of six diverse, general-purpose data processing platforms using an HEP benchmark. The result of the evaluation is an interesting and rather complex picture of existing solutions: Their query languages vary greatly in how natural and concise HEP query patterns can be expressed. Furthermore, most of them are also between one and two orders of magnitude slower than the domain-specific system used by particle physicists today. These observations suggest that, while database systems and their query languages are in principle viable tools for HEP,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Distributed and Parallel Computing Systems · Big Data Technologies and Applications
