Toward Attention-based TinyML: A Heterogeneous Accelerated Architecture and Automated Deployment Flow
Philip Wiese, Gamze \.Islamo\u{g}lu, Moritz Scherer, Luka Macan,, Victor J.B. Jung, Alessio Burrello, Francesco Conti, Luca Benini

TL;DR
This paper presents a heterogeneous architecture combining RISC-V processors and accelerators for tinyML, enabling efficient deployment of attention-based models with high energy efficiency and throughput.
Contribution
It introduces an automated deployment flow and a heterogeneous architecture tailored for attention-based tinyML models, advancing the state-of-the-art in energy-efficient inference.
Findings
Achieved 2960 GOp/J energy efficiency
Reached 154 GOp/s throughput
Demonstrated end-to-end 8-bit Transformer inference
Abstract
One of the challenges for Tiny Machine Learning (tinyML) is keeping up with the evolution of Machine Learning models from Convolutional Neural Networks to Transformers. We address this by leveraging a heterogeneous architectural template coupling RISC-V processors with hardwired accelerators supported by an automated deployment flow. We demonstrate Attention-based models in a tinyML power envelope with an octa-core cluster coupled with an accelerator for quantized Attention. Our deployment flow enables end-to-end 8-bit Transformer inference, achieving leading-edge energy efficiency and throughput of 2960 GOp/J and 154 GOp/s (0.65 V, 22 nm FD-SOI technology).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Graph Theory and Algorithms · Distributed and Parallel Computing Systems
MethodsAttention Is All You Need · Byte Pair Encoding · Absolute Position Encodings · Dropout · Adam · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Softmax · Linear Layer
