Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition
Jos\'e Juan Hern\'andez Morales, Georgios Mentzos, Frank Hannig, Konstantinos Balaskas, Georgios Zervakis, J\"org Henkel, J\"urgen Teich

TL;DR
This paper presents a novel framework using approximate matrix decomposition and genetic algorithms to optimize CNN accelerators for TinyML devices, achieving significant latency reduction without retraining.
Contribution
It introduces a training-free, hardware-efficient optimization method for CNNs tailored for resource-constrained TinyML devices, leveraging approximate matrix decomposition and evolutionary search.
Findings
Achieves 33% average latency improvement on TinyML benchmarks.
Maintains high accuracy with only 1.3% average loss.
Generates FPGA accelerator designs that meet strict resource constraints.
Abstract
The paradigm shift towards local and on-device inference under stringent resource constraints is represented by the tiny machine learning (TinyML) domain. The primary goal of TinyML is to integrate intelligence into tiny, low-cost devices under strict resource, energy, and latency constraints. However, the ultra-resource-constrained nature of these devices can lead to increased inference execution time, which can be detrimental in latency critical applications. At the same time, TinyML applications are often associated with sensitive data. As such, latency optimization approaches that rely on training samples are infeasible when such data is unavailable, proprietary, or sensitive, highlighting a pressing need for optimization approaches that do not require access to the training dataset and can be applied directly to pre-trained models. Replacing costly multiplications with more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
