PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications
Stephen Mell, David Mell, Konstantinos Kallas, Steve Zdancewic, Osbert Bastani

TL;DR
PopPy is a system that automatically finds and exploits parallelism in Python-based compound AI applications, significantly reducing their execution time without altering program semantics.
Contribution
It introduces a novel approach combining ahead-of-time compilation and runtime analysis to uncover parallelism in complex Python applications invoking external AI components.
Findings
Achieves up to 6.4x speedup in real-world AI applications
Supports a broad subset of Python with minimal developer effort
Preserves sequential semantics while optimizing performance
Abstract
Compound AI applications, which compose calls to ML models using a general-purpose programming language like Python, are widely used for a variety of user-facing tasks, from software engineering to enterprise automation, making their end-to-end latency a critical bottleneck. In contrast to traditional applications, execution time is dominated by the external components, which cannot be handled by traditional language optimization systems, like optimizing compilers. To address this problem, we develop PopPy, a system that can uncover parallelization opportunities in Python applications that invoke these heavy external components, including those used in compound AI applications. PopPy supports a very expressive fragment of Python and requires minimal developer input to uncover parallelism. It combines an ahead-of-time compiler with a runtime, addressing three key challenges in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
