Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars

Matheus Farias; H. T. Kung

arXiv:2410.11298·cs.AR·July 10, 2025

Sorted Weight Sectioning for Energy-Efficient Unstructured Sparse DNNs on Compute-in-Memory Crossbars

Matheus Farias, H. T. Kung

PDF

Open Access

TL;DR

This paper presents sorted weight sectioning (SWS), an algorithm that optimizes weight placement on compute-in-memory crossbars to significantly reduce ADC energy consumption in unstructured sparse DNNs, especially for models like BERT.

Contribution

The paper introduces SWS, a novel weight allocation algorithm that leverages weight distribution and sparsity to minimize ADC energy use in CIM crossbars for DNNs.

Findings

01

Reduces ADC energy by 89.5% on unstructured sparse BERT models.

02

Effectively exploits weight distribution and sparsity for energy savings.

03

Maintains accuracy while significantly decreasing energy consumption.

Abstract

We introduce $sorted weight sectioning$ (SWS): a weight allocation algorithm that places sorted deep neural network (DNN) weight sections on bit-sliced compute-in-memory (CIM) crossbars to reduce analog-to-digital converter (ADC) energy consumption. Data conversions are the most energy-intensive process in crossbar operation. SWS effectively reduces this cost leveraging (1) small weights and (2) zero weights (weight sparsity). DNN weights follow bell-shaped distributions, with most weights near zero. Using SWS, we only need low-order crossbar columns for sections with low-magnitude weights. This reduces the quantity and resolution of ADCs used, exponentially decreasing ADC energy costs without significantly degrading DNN accuracy. Unstructured sparsification further sharpens the weight distribution with small accuracy loss. However, it presents challenges in hardware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNetwork Packet Processing and Optimization · Ferroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Dense Connections · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Dropout