'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators

Ruichi Han; Yizhi Chen; Tong Lei; Jordi Altayo Gonzalez; and Ahmed Hemani (Department of Electronics; Embedded Systems; KTH Royal Institute of Technology; Stockholm; Sweden)

arXiv:2601.14087·cs.AR·January 21, 2026

'1'-bit Count-based Sorting Unit to Reduce Link Power in DNN Accelerators

Ruichi Han, Yizhi Chen, Tong Lei, Jordi Altayo Gonzalez, and Ahmed Hemani (Department of Electronics, Embedded Systems, KTH Royal Institute of Technology, Stockholm, Sweden)

PDF

Open Access

TL;DR

This paper introduces a hardware-efficient, approximate '1'-bit count-based sorting unit for DNN accelerators that reduces link power consumption and hardware area, specifically optimized for CNNs.

Contribution

It presents a novel comparison-free, approximate sorting hardware design that significantly reduces area while maintaining power savings in DNN accelerators.

Findings

01

Achieves up to 35.4% area reduction

02

Reduces link power consumption through data reordering

03

Maintains substantial bit-toggle reduction with approximate sorting

Abstract

Interconnect power consumption remains a bottleneck in Deep Neural Network (DNN) accelerators. While ordering data based on '1'-bit counts can mitigate this via reduced switching activity, practical hardware sorting implementations remain underexplored. This work proposes the hardware implementation of a comparison-free sorting unit optimized for Convolutional Neural Networks (CNN). By leveraging approximate computing to group population counts into coarse-grained buckets, our design achieves hardware area reductions while preserving the link power benefits of data reordering. Our approximate sorting unit achieves up to 35.4% area reduction while maintaining 19.50\% BT reduction compared to 20.42% of precise implementation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques