Lachesis: Automatic Partitioning for UDF-Centric Analytics
Jia Zou, Amitabh Das, Pratik Barhate, Arun Iyengar, Binhang Yuan,, Dimitrije Jankov, Chris Jermaine

TL;DR
Lachesis is a system that automates data partitioning in UDF-centric analytics workloads by modeling workflows and using deep reinforcement learning to optimize data storage, enhancing performance and productivity.
Contribution
It introduces a novel workflow-based representation for UDF workloads and employs deep reinforcement learning for automatic partitioning decisions.
Findings
Improves data processing performance in UDF workloads
Automates partitioning without manual intervention
Enhances productivity by optimizing data storage
Abstract
Persistent partitioning is effective in avoiding expensive shuffling operations. However it remains a significant challenge to automate this process for Big Data analytics workloads that extensively use user defined functions (UDFs), where sub-computations are hard to be reused for partitionings compared to relational applications. In addition, functional dependency that is widely utilized for partitioning selection is often unavailable in the unstructured data that is ubiquitous in UDF-centric analytics. We propose the Lachesis system, which represents UDF-centric workloads as workflows of analyzable and reusable sub-computations. Lachesis further adopts a deep reinforcement learning model to infer which sub-computations should be used to partition the underlying data. This analysis is then applied to automatically optimize the storage of the data across applications to improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Advanced Data Storage Technologies
