YOCO: A Hybrid In-Memory Computing Architecture with 8-bit Sub-PetaOps/W In-Situ Multiply Arithmetic for Large-Scale AI

Zihao Xuan; Yuxuan Yang; Wei Xuan; Zijia Su; Song Chen; Yi Kang

arXiv:2312.11836·cs.AR·June 12, 2025·1 cites

YOCO: A Hybrid In-Memory Computing Architecture with 8-bit Sub-PetaOps/W In-Situ Multiply Arithmetic for Large-Scale AI

Zihao Xuan, Yuxuan Yang, Wei Xuan, Zijia Su, Song Chen, Yi Kang

PDF

Open Access

TL;DR

YOCO is a novel AI accelerator architecture that combines in-memory computing with hybrid memory and optimized attention flow, achieving high energy efficiency and throughput for large-scale AI models.

Contribution

YOCO introduces a new 8-bit in-situ multiply arithmetic, hybrid ReRAM-SRAM memory structure, and an IMC-friendly attention computing flow, advancing in-memory AI hardware design.

Findings

01

Achieves 123.8 TOPS/W energy efficiency

02

Improves throughput by up to 33.6x over baselines

03

Enhances energy efficiency by up to 19.9x across models

Abstract

In this paper, we further explore the potential of analog in-memory computing (AiMC) and introduce an innovative artificial intelligence (AI) accelerator architecture named YOCO, featuring three key proposals: (1) YOCO proposes a novel 8-bit in-situ multiply arithmetic (IMA) achieving 123.8 TOPS/W energy-efficiency and 34.9 TOPS throughput through efficient charge-domain computation and timedomain accumulation mechanism. (2) YOCO employs a hybrid ReRAM-SRAM memory structure to balance computational efficiency and storage density. (3) YOCO tailors an IMC-friendly attention computing flow with an efficient pipeline to accelerate the inference of transformer-based AI models. Compared to three SOTA baselines, YOCO on average improves energy efficiency by up to 3.9x-19.9x and throughput by up to 6.8x-33.6x across 10 CNN/transformer models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Semiconductor materials and devices

MethodsFocus