A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan,, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, Arvind, Krishnamurthy

TL;DR
This paper introduces VTA, a flexible, programmable deep learning architecture with a two-level ISA and JIT compiler, enabling adaptable hardware acceleration for evolving DL workloads, integrated into Apache TVM.
Contribution
The paper presents VTA, a novel extensible hardware-software blueprint for deep learning acceleration that maintains flexibility through a parametrizable design and runtime code generation.
Findings
VTA achieves high performance with flexible architecture.
Successfully deployed DL models on edge FPGAs using VTA.
Open-sourced VTA integrated into Apache TVM.
Abstract
Specialized Deep Learning (DL) acceleration stacks, designed for a specific set of frameworks, model architectures, operators, and data types, offer the allure of high performance while sacrificing flexibility. Changes in algorithms, models, operators, or numerical systems threaten the viability of specialized hardware accelerators. We propose VTA, a programmable deep learning architecture template designed to be extensible in the face of evolving workloads. VTA achieves this flexibility via a parametrizable architecture, two-level ISA, and a JIT compiler. The two-level ISA is based on (1) a task-ISA that explicitly orchestrates concurrent compute and memory tasks and (2) a microcode-ISA which implements a wide variety of operators with single-cycle tensor-tensor operations. Next, we propose a runtime system equipped with a JIT compiler for flexible code-generation and heterogeneous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
