Vortex: Efficient Sample-Free Dynamic Tensor Program Optimization via Hardware-aware Strategy Space Hierarchization
Yangjie Zhou, Honglin Zhu, Qian Qiu, Weihao Cui, Zihan Liu, Cong Guo,, Siyuan Feng, Jintao Meng, Haidong Lan, Jingwen Leng, Wenxi Zhu, Minwen Deng

TL;DR
Vortex is a hardware-aware, sample-free compiler for dynamic-shape tensor programs that significantly improves compilation efficiency and runtime performance across CPU and GPU platforms by leveraging hardware information and strategy space hierarchization.
Contribution
Vortex introduces a novel hardware-driven, sample-free compilation approach with a bidirectional workflow and strategy space hierarchization for dynamic-shape tensor programs.
Findings
Reduces compilation time by 176x compared to existing methods.
Achieves 2.53x speedup on CPU and 3.01x on GPU over vendor libraries.
Outperforms existing dynamic-shape compilers in efficiency and performance.
Abstract
Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely heavily on predefined samples to guide the compilation process, which restricts their adaptability and efficiency. These sample-driven methods struggle to efficiently manage the diverse and unpredictable shapes encountered in real-world scenarios, often resulting in suboptimal performance. To tackle these issues, we introduce Vortex, a hardware-driven and sample-free compiler tailored for dynamic-shape tensor programs. Vortex capitalizes on detailed hardware information and hierarchizes the strategy space to facilitate high-performance code generation without relying on runtime shape samples. It features a unique bidirectional compilation workflow,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Physics and Python Applications · Tensor decomposition and applications · Parallel Computing and Optimization Techniques
