BladeDISC++: Memory Optimizations Based On Symbolic Shape

Xiulong Yuan; Xu Yan; Wenting Shen; Xiafei Qiu; Ang Wang; Jie Zhang,; Yong Li; and Wei Lin

arXiv:2412.16985·cs.DC·December 24, 2024

BladeDISC++: Memory Optimizations Based On Symbolic Shape

Xiulong Yuan, Xu Yan, Wenting Shen, Xiafei Qiu, Ang Wang, Jie Zhang,, Yong Li, and Wei Lin

PDF

Open Access

TL;DR

BladeDISC++ introduces memory optimization techniques for dynamic shape graphs in deep learning, leveraging symbolic shapes and a combined compile-time and runtime strategy to reduce memory usage without precise shape information.

Contribution

It presents novel op scheduling and rematerialization methods based on symbolic shapes, addressing memory optimization challenges in dynamic shape compilers.

Findings

01

Reduces memory consumption in dynamic shape graphs

02

Achieves memory efficiency comparable to precise shape optimizations

03

Enhances adoption of dynamic shape compilers in large models

Abstract

Recent deep learning workloads exhibit dynamic characteristics, leading to the rising adoption of dynamic shape compilers. These compilers can generate efficient kernels for dynamic shape graphs characterized by a fixed graph topology and uncertain tensor shapes. However, memory optimization, although particularly crucial in this large model era, remains relatively underexplored for dynamic shape graphs. The fundamental challenge lies in the lack of precise tensor shapes which are essential in conventional methods such as operation scheduling(op scheduling) and rematerialization. To address this challenge, we propose op scheduling and rematerialization approaches based on symbolic shapes and developed BladeDISC++. Besides, since rematerialization decisions cannot be made solely at compile time when tensor shapes are unknown, BladeDISC++ employs a compilation-runtime combined strategy to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques