TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Chaoyao Shen; Linfeng Jiang; Yixian Shen; Tao Xu; Guoqing Li; Anuj Pathania; Andy D. Pimentel; and Meng Zhang

arXiv:2604.12891·cs.LG·April 15, 2026

TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Chaoyao Shen, Linfeng Jiang, Yixian Shen, Tao Xu, Guoqing Li, Anuj Pathania, Andy D. Pimentel, and Meng Zhang

PDF

TL;DR

TCL is a new compiler framework that uses continual learning and active data selection to optimize tensor programs efficiently across various hardware platforms, reducing costs and improving transferability.

Contribution

It introduces three core components: an active learning sampler, a lightweight cost model, and a knowledge distillation framework for cross-platform transfer learning.

Findings

01

Achieves 16.8x faster tuning on CPU and 12.48x on GPU.

02

Reduces data collection costs by selecting only 10% of programs.

03

Improves inference latency by 1.20x on CPU and 1.13x on GPU.

Abstract

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal transferability across platforms. In this paper, we introduce TCL, a novel efficient and transferable compiler framework for fast tensor program optimization across diverse hardware platforms to address these challenges. Specifically, TCL is built on three core enablers: (1) the RDU Sampler, a data-efficient active learning strategy that selects only 10% of tensor programs by jointly optimizing Representativeness, Diversity, and Uncertainty, substantially reducing data collection costs while maintaining near-original model accuracy; (2) a new Mamba-based cost model that efficiently captures long-range schedule dependencies while achieving a favorable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.