Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

Vishal Shashidhar; Anupam Kumari; Roy P Paily

arXiv:2603.10100·cs.LG·April 10, 2026

Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

Vishal Shashidhar, Anupam Kumari, Roy P Paily

PDF

TL;DR

This paper introduces a hardware-efficient approximate convolution method using soft sparsity and a custom RISC-V instruction, significantly reducing computations and power consumption in CNNs without accuracy loss.

Contribution

It proposes a novel soft sparsity paradigm with a custom MSB proxy, enabling efficient skipping of negligible multiplications in CNNs on hardware.

Findings

01

Reduced ReLU MACs by 88.42% with no accuracy loss

02

Reduced Tanh MACs by 74.87% with no accuracy loss

03

Estimated power savings of over 29% through clock-gating

Abstract

Modern CNNs' high computational demands hinder edge deployment, as traditional ``hard'' sparsity (skipping mathematical zeros) loses effectiveness in deep layers or with smooth activations like Tanh. We propose a ``soft sparsity'' paradigm using a hardware efficient Most Significant Bit (MSB) proxy to skip negligible non-zero multiplications. Integrated as a custom RISC-V instruction and evaluated on LeNet-5 (MNIST), this method reduces ReLU MACs by 88.42% and Tanh MACs by 74.87% with zero accuracy loss--outperforming zero-skipping by 5x. By clock-gating inactive multipliers, we estimate power savings of 35.2% for ReLU and 29.96% for Tanh. While memory access makes power reduction sub-linear to operation savings, this approach significantly optimizes resource-constrained inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.