VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination

Muyan Hu; Ahan Gupta; Jiachen Yuan; Vima Gupta; Taeksang Kim; Xin Xu; Janardhan Kulkarni; Ofer Dekel; Vikram Adve; Charith Mendis

arXiv:2604.09558·cs.DC·April 14, 2026

VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination

Muyan Hu, Ahan Gupta, Jiachen Yuan, Vima Gupta, Taeksang Kim, Xin Xu, Janardhan Kulkarni, Ofer Dekel, Vikram Adve, Charith Mendis

PDF

TL;DR

VTC is a new tensor compilation framework that eliminates unnecessary data movement in DNNs by using virtual tensors and an automatic strategy, outperforming existing compilers on NVIDIA GPUs.

Contribution

VTC introduces virtual tensors and a novel algorithm to fully eliminate data movement in DNN compilation, covering all data movement operators.

Findings

01

VTC outperforms existing ML compilers by up to 1.93x on NVIDIA GPUs.

02

Achieves up to 60% inference memory savings.

03

Demonstrates effectiveness across various DNN models.

Abstract

With the widening gap between compute and memory operation latencies, data movement optimizations have become increasingly important for DNN compilation. Current optimizations such as layout transformations and operator fusion only target a subset of tensor operators and consequently miss important opportunities for reducing data movement in contemporary DNN workloads, including large language models. We introduce VTC, a novel tensor compilation framework that for the first time eliminates all unnecessary data movement by targeting the full spectrum of data movement operators. VTC proposes the concept of virtual tensors to track data movement between compute operators via index mappings rather than expensive physical data transfers to and from global memory, which can seamlessly interoperate with existing computation kernels and handle arbitrary tensor operator compositions. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.