Readable and efficient HEP data analysis with bamboo
Pieter David

TL;DR
Bamboo is a Python-embedded domain-specific language for high-energy physics data analysis that combines concise expression, high performance, and flexibility, facilitating complex analyses with reusable components and adaptability to various data formats.
Contribution
It introduces a new analysis framework that integrates a domain-specific language with ROOT's RDataFrame and JIT compilation, enabling efficient and customizable HEP data analysis.
Findings
Achieves near-native performance using ROOT's RDataFrame and cling JIT compiler.
Supports complex analyses with reusable components for NanoAOD data format.
Demonstrated successful application in CMS Run 2 analyses.
Abstract
With the LHC continuing to collect more data and experimental analyses becoming increasingly complex, tools to efficiently develop and execute these analyses are essential. The bamboo framework defines a domain-specific language, embedded in python, that allows to concisely express the analysis logic in a functional style. The implementation based on ROOT's RDataFrame and cling C++ JIT compiler approaches the performance of dedicated native code. Bamboo is currently being used for several CMS Run 2 analyses that rely on the NanoAOD data format, which will become more common in Run 3 and beyond, and for which many reusable components are included, but it provides many possibilities for customisation, which allow for straightforward adaptation to other formats and workflows
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
