Gradual Metaprogramming
Tianyu Chen, Darshal Shetty, Jeremy G. Siek, Chao-Hong Chen, Weixi Ma, Arnaud Venet, Rocky Liu

TL;DR
This paper introduces gradual metaprogramming for Python-embedded DSLs in data engineering, enabling earlier type error detection and source code localization during code generation.
Contribution
It presents MetaGTLC, a formal calculus for gradually-typed metaprogramming, and proves its correctness with mechanized proofs in Agda.
Findings
Provides a formal foundation for gradual metaprogramming in data pipelines.
Enables early detection of type errors during code generation.
Mechanized proof of correctness in Agda confirms the approach's soundness.
Abstract
Data engineers increasingly use domain-specific languages (DSLs) to generate the code for data pipelines. Such DSLs are often embedded in Python. Unfortunately, there are challenges in debugging the generation of data pipelines: an error in a Python DSL script is often detected too late, after the execution of the script, and the source code location that triggers the error is hard to pinpoint. In this paper, we focus on the scenario where a DSL embedded in Python (so it is dynamically-typed) generates data pipeline description code that is statically-typed. We propose gradual metaprogramming to (1) provide a migration path toward statically typed DSLs, (2) immediately provide earlier detection of code generation type errors, and (3) report the source code location responsible for the type error. Gradual metaprogramming accomplishes this by type checking code fragments and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
