A Grammar of Data Analysis
Xunmo Yang, Taylor Pospisil, Omkar Muralidharan, Dennis L. Sun

TL;DR
This paper introduces a grammar for data analysis focusing on metrics and dimensions, and presents Meterstick, a Python tool that applies this grammar across various data sources.
Contribution
It defines a new grammar for data analysis primitives and provides a versatile Python implementation called Meterstick that is data source agnostic.
Findings
Meterstick supports multiple data sources including DataFrames and SQL databases.
The grammar clarifies the conceptual structure of data analysis tasks.
Meterstick enables consistent and flexible data analysis workflows.
Abstract
This paper outlines a grammar of data analysis, as distinct from grammars of data manipulation, in which the primitives are metrics and dimensions. We describe a Python implementation of this grammar called Meterstick, which is agnostic to the underlying data source, which may be a DataFrame or a SQL database.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Data Analysis with R · Modeling and Simulation Systems
