# An energy efficient processor array and memory controller for accurate processing of convolutional neural network-based inference engines

**Authors:** S. Deepika, V. Arunachalam

PMC · DOI: 10.1038/s41598-025-23303-5 · Scientific Reports · 2025-11-12

## TL;DR

This paper presents an energy-efficient processor array and memory controller for CNN inference that uses sparsity to improve performance and reduce energy consumption.

## Contribution

The novel contribution is a CIW-ZVC controller and processor array design that exploits unstructured sparsity in CNNs for improved energy and area efficiency.

## Key findings

- The design achieves 256 × 10⁹ Operations/Second and 15 × 10¹² OPS/Watt energy efficiency per FC layer.
- It improves energy efficiency by up to 6.08 times and area efficiency by 7.6 times compared to existing processors.
- The model maintains 95% classification accuracy on the ImageNet dataset with ~20% sparsity.

## Abstract

Exploiting unstructured sparsity in the hardware accelerator of a Convolutional Neural Networks (CNNs) based inference can improve energy efficiency. However, it needs a complex controller for indexing and load-balancing. A controller for managing unstructured sparsity in Fully Connected (FC) layers is designed. In a pre-trained Visual Geometry-Group-16 (VGG-16) model, a ~ 20% sparsity is introduced using an induced sparsity mechanism. ImageNet dataset-based analysis of this model provides 95% classification accuracy and 0.96 harmonic mean of precision and recall. Each Input Feature Map (IFM) and its corresponding weight vector of an FC layer are arranged in a row of memory. A Combined IFM & Weights - Zero Valued Compression (CIW-ZVC) controller permits only the valid data from off-chip to on-chip memory. This is improving the data-movement rate with minimum hardware overhead. A processor array of 256 Convolution Operators (COs) and parallel computations with zero-gating on weights is used to compute in a 16-tiles per on-chip memory cycle. IFM is stationary for all the tiles which allows load-balancing with ease. This implementation with 14 nm accomplished a peak performance and energy efficiency of 256 × 109 Operations/Second (OPS) and 15 × 1012 OPS/Watt per FC (VGG-16) layer respectively. Also, it improves energy efficiency to a maximum of 6.08 times and area efficiency to 7.6 times compared to the existing processors.

## Full-text entities

- **Diseases:** MAC (MESH:C579880), SIFM (MESH:C564133)
- **Chemicals:** DRAM (-)
- **Cell lines:** CL -13 — Homo sapiens (Human), Childhood T acute lymphoblastic leukemia, Cancer cell line (CVCL_1081)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12612215/full.md

## Figures

22 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12612215/full.md

## References

29 references — full list in the complete paper: https://tomesphere.com/paper/PMC12612215/full.md

---
Source: https://tomesphere.com/paper/PMC12612215