Homomorphism Calculus for User-Defined Aggregations

Ziteng Wang; Ruijie Fang; Linus Zheng; Dixin Tang; Isil Dillig

arXiv:2508.15109·cs.PL·August 22, 2025

Homomorphism Calculus for User-Defined Aggregations

Ziteng Wang, Ruijie Fang, Linus Zheng, Dixin Tang, Isil Dillig

PDF

TL;DR

This paper introduces a homomorphism calculus to verify and construct merge operators for user-defined aggregation functions, enabling efficient parallel and incremental data processing in frameworks like Spark and Flink.

Contribution

It presents a novel calculus that can determine whether a UDAF is a homomorphism and construct merge operators, improving efficiency over existing synthesizers.

Findings

01

Our algorithm outperforms two leading synthesizers on real-world UDAFs.

02

The calculus effectively verifies homomorphism properties in UDAFs.

03

Constructed merge operators enable incremental and parallel computation.

Abstract

Data processing frameworks like Apache Spark and Flink provide built-in support for user-defined aggregation functions (UDAFs), enabling the integration of domain-specific logic. However, for these frameworks to support \emph{efficient} UDAF execution, the function needs to satisfy a \emph{homomorphism property}, which ensures that partial results from independent computations can be merged correctly. Motivated by this problem, this paper introduces a novel \emph{homomorphism calculus} that can both verify and refute whether a UDAF is a dataframe homomorphism. If so, our calculus also enables the construction of a corresponding merge operator which can be used for incremental computation and parallel execution. We have implemented an algorithm based on our proposed calculus and evaluate it on real-world UDAFs, demonstrating that our approach significantly outperforms two leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.