VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination
Muyan Hu, Ahan Gupta, Jiachen Yuan, Vima Gupta, Taeksang Kim, Xin Xu, Janardhan Kulkarni, Ofer Dekel, Vikram Adve, Charith Mendis

TL;DR
VTC is a new tensor compilation framework that eliminates unnecessary data movement in DNNs by using virtual tensors and an automatic strategy, outperforming existing compilers on NVIDIA GPUs.
Contribution
VTC introduces virtual tensors and a novel algorithm to fully eliminate data movement in DNN compilation, covering all data movement operators.
Findings
VTC outperforms existing ML compilers by up to 1.93x on NVIDIA GPUs.
Achieves up to 60% inference memory savings.
Demonstrates effectiveness across various DNN models.
Abstract
With the widening gap between compute and memory operation latencies, data movement optimizations have become increasingly important for DNN compilation. Current optimizations such as layout transformations and operator fusion only target a subset of tensor operators and consequently miss important opportunities for reducing data movement in contemporary DNN workloads, including large language models. We introduce VTC, a novel tensor compilation framework that for the first time eliminates all unnecessary data movement by targeting the full spectrum of data movement operators. VTC proposes the concept of virtual tensors to track data movement between compute operators via index mappings rather than expensive physical data transfers to and from global memory, which can seamlessly interoperate with existing computation kernels and handle arbitrary tensor operator compositions. We also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
