YOCO: A Hybrid In-Memory Computing Architecture with 8-bit Sub-PetaOps/W In-Situ Multiply Arithmetic for Large-Scale AI
Zihao Xuan, Yuxuan Yang, Wei Xuan, Zijia Su, Song Chen, Yi Kang

TL;DR
YOCO is a novel AI accelerator architecture that combines in-memory computing with hybrid memory and optimized attention flow, achieving high energy efficiency and throughput for large-scale AI models.
Contribution
YOCO introduces a new 8-bit in-situ multiply arithmetic, hybrid ReRAM-SRAM memory structure, and an IMC-friendly attention computing flow, advancing in-memory AI hardware design.
Findings
Achieves 123.8 TOPS/W energy efficiency
Improves throughput by up to 33.6x over baselines
Enhances energy efficiency by up to 19.9x across models
Abstract
In this paper, we further explore the potential of analog in-memory computing (AiMC) and introduce an innovative artificial intelligence (AI) accelerator architecture named YOCO, featuring three key proposals: (1) YOCO proposes a novel 8-bit in-situ multiply arithmetic (IMA) achieving 123.8 TOPS/W energy-efficiency and 34.9 TOPS throughput through efficient charge-domain computation and timedomain accumulation mechanism. (2) YOCO employs a hybrid ReRAM-SRAM memory structure to balance computational efficiency and storage density. (3) YOCO tailors an IMC-friendly attention computing flow with an efficient pipeline to accelerate the inference of transformer-based AI models. Compared to three SOTA baselines, YOCO on average improves energy efficiency by up to 3.9x-19.9x and throughput by up to 6.8x-33.6x across 10 CNN/transformer models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Semiconductor materials and devices
MethodsFocus
