Parsing TTree Formula in Python
Aryan Roy, Jim Pivarski

TL;DR
This paper introduces 'formulate', a Python package that parses ROOT's TTreeFormula language using Lark, enabling conversion to NumExpr and Awkward Array, thus enhancing ROOT data analysis in Python.
Contribution
The paper presents a formal BNF grammar and a LALR parser for TTreeFormula, along with a design for converting expressions into multiple data manipulation languages.
Findings
Successfully parsed TTreeFormula with Lark in Python
Converted expressions into NumExpr and Awkward Array formats
Zero dependencies make it easily integrable with Uproot
Abstract
Uproot can read ROOT files directly in pure Python but cannot (yet) compute expressions in ROOT's TTreeFormula expression language. Despite its popularity, this language has only one implementation and no formal specification. In a package called "formulate," we defined the language's syntax in standard BNF and parse it with Lark, a fast and modern parsing toolkit in Python. With formulate, users can now convert ROOT TTreeFormula expressions into NumExpr and Awkward Array manipulations. In this contribution, we describe BNF notation and the Look Ahead Left to Right (LALR) parsing algorithm, which scales linearly with expression length. We also present the challenges with interpreting TTreeFormula expressions as a functional language; some function-like forms can't be expressed as true functions. We also describe the design of the abstract syntax tree that facilitates conversion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
