Homomorphism Calculus for User-Defined Aggregations
Ziteng Wang, Ruijie Fang, Linus Zheng, Dixin Tang, Isil Dillig

TL;DR
This paper introduces a homomorphism calculus to verify and construct merge operators for user-defined aggregation functions, enabling efficient parallel and incremental data processing in frameworks like Spark and Flink.
Contribution
It presents a novel calculus that can determine whether a UDAF is a homomorphism and construct merge operators, improving efficiency over existing synthesizers.
Findings
Our algorithm outperforms two leading synthesizers on real-world UDAFs.
The calculus effectively verifies homomorphism properties in UDAFs.
Constructed merge operators enable incremental and parallel computation.
Abstract
Data processing frameworks like Apache Spark and Flink provide built-in support for user-defined aggregation functions (UDAFs), enabling the integration of domain-specific logic. However, for these frameworks to support \emph{efficient} UDAF execution, the function needs to satisfy a \emph{homomorphism property}, which ensures that partial results from independent computations can be merged correctly. Motivated by this problem, this paper introduces a novel \emph{homomorphism calculus} that can both verify and refute whether a UDAF is a dataframe homomorphism. If so, our calculus also enables the construction of a corresponding merge operator which can be used for incremental computation and parallel execution. We have implemented an algorithm based on our proposed calculus and evaluate it on real-world UDAFs, demonstrating that our approach significantly outperforms two leading…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
