Improving a High Productivity Data Analytics Chapel Framework
Prashanth Pai, Andrej Jakovljevi\'c, Zoran Budimli\'c, Costin Iancu

TL;DR
This paper enhances the Arkouda data analytics framework by introducing client-side optimizations that significantly boost performance while preserving its interactive capabilities, through techniques like operation delaying, caching, and subexpression elimination.
Contribution
The paper proposes novel client-side optimization techniques for Arkouda that improve performance without sacrificing interactivity, including operation delaying, caching, and result reuse.
Findings
Performance improvements of 20% to 120% across benchmarks
Maintains full interactivity of the framework
Effective caching and operation reuse strategies
Abstract
Most state of the art exploratory data analysis frameworks fall into one of the two extremes: they either focus on the high-performance computational, or on the interactive and open-ended aspects of the analysis. Arkouda is a framework that attempts to integrate the interactive approach with the high performance computation by using a novel client-server architecture, with a Python interpreter on the client side for the interactions with the scientist and a Chapel server for performing the demanding high-performance computations. The Arkouda Python interpreter overloads the Python operators and transforms them into messages to the Chapel server that performs the actual computation. In this paper, we are proposing several client-side optimization techniques for the Arkouda framework that maintain the interactive nature of the Arkouda framework, but at the same time significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
