RISC-V Based TinyML Accelerator for Depthwise Separable Convolutions in Edge AI
Muhammed Yildirim, Ozcan Ozturk

TL;DR
This paper presents a RISC-V based TinyML hardware accelerator that eliminates intermediate buffers for Depthwise Separable Convolutions, significantly reducing data movement and improving speed for Edge AI applications.
Contribution
A novel fused pixel-wise dataflow architecture implemented as a RISC-V CFU that eliminates intermediate buffers in DSC processing, reducing energy and latency.
Findings
Achieves up to 59.3x speedup over software baseline.
Reduces data movement by up to 87%.
Supports compact ASIC implementations with low power consumption.
Abstract
The increasing demand for on-device intelligence in Edge AI and TinyML applications requires the efficient execution of modern Convolutional Neural Networks (CNNs). While lightweight architectures like MobileNetV2 employ Depthwise Separable Convolutions (DSC) to reduce computational complexity, their multi-stage design introduces a critical performance bottleneck inherent to layer-by-layer execution: the high energy and latency cost of transferring intermediate feature maps to either large on-chip buffers or off-chip DRAM. To address this memory wall, this paper introduces a novel hardware accelerator architecture that utilizes a fused pixel-wise dataflow. Implemented as a Custom Function Unit (CFU) for a RISC-V processor, our architecture eliminates the need for intermediate buffers entirely, reducing the data movement up to 87\% compared to conventional layer-by-layer execution. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
