Split-Apply-Combine with Dynamic Grouping
Mark P.J. van der Loo

TL;DR
This paper introduces the R package accumulate, enabling dynamic grouping in data aggregation by collapsing subsets based on user-defined conditions, addressing limitations in traditional split-apply-combine operations.
Contribution
It presents a novel algorithm and software implementation for dynamic grouping in data analysis, expanding the capabilities of split-apply-combine methods.
Findings
Provides a formal algorithm for dynamic grouping
Implements a user-friendly R package accumulate
Enables flexible, condition-based data aggregation
Abstract
Partitioning a data set by one or more of its attributes and computing an aggregate for each part is one of the most common operations in data analyses. There are use cases where the partitioning is determined dynamically by collapsing smaller subsets into larger ones, to ensure sufficient support for the computed aggregate. These use cases are not supported by software implementing split-apply-combine types of operations. This paper presents the \texttt{R} package \texttt{accumulate} that offers convenient interfaces for defining grouped aggregation where the grouping itself is dynamically determined, based on user-defined conditions on subsets, and a user-defined subset collapsing scheme. The formal underlying algorithm is described and analyzed as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques
