VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling

Zi-Wei Lin; and Tian-Sheuan Chang

arXiv:2604.27396·cs.AR·May 5, 2026

VitaLLM: A Versatile, Ultra-Compact Ternary LLM Accelerator with Dependency-Aware Scheduling

Zi-Wei Lin, and Tian-Sheuan Chang

PDF

TL;DR

VitaLLM is a hardware-software co-designed accelerator optimized for efficient ternary LLM inference on edge devices, achieving high throughput, low power, and adaptability through innovative scheduling and specialized cores.

Contribution

The paper introduces VitaLLM, a novel accelerator with a dual-core strategy, dependency-aware scheduling, and pruning mechanisms, enabling efficient ternary LLM deployment on resource-constrained hardware.

Findings

01

Achieves 70.70 tokens/sec decoding throughput.

02

Consumes only 65.97 mW power in TSMC 16nm process.

03

Outperforms state-of-the-art accelerators in FOM.

Abstract

Deploying Large Language Models (LLMs) on resource-constrained edge devices faces critical bottlenecks in memory bandwidth and power consumption. While ternary quantization (e.g., BitNet b1.58) significantly reduces model size, its direct deployment on general-purpose hardware is hindered by workload imbalance, bandwidth-bound decoding, and strict data dependencies. To address these challenges, we propose \textbf{VitaLLM}, a hardware-software co-designed accelerator tailored for efficient ternary LLM inference. We introduce a heterogeneous \textbf{Dual-Core Compute Strategy} that synergizes specialized TINT-Cores for massive ternary projections with a unified BoothFlex-Core for mixed-precision attention, ensuring high utilization across both compute-bound prefill and bandwidth-bound decode stages. Furthermore, we develop a \textbf{Leading One Prediction (LOP)} mechanism to prune…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.