Chameleon: A MatMul-Free Temporal Convolutional Network Accelerator for End-to-End Few-Shot and Continual Learning from Sequential Data
Douwe den Blanken, Charlotte Frenkel

TL;DR
Chameleon is a low-power, on-chip accelerator for end-to-end few-shot and continual learning from sequential data, using a novel architecture and matrix-multiplication-free compute array to achieve high accuracy and efficiency.
Contribution
It introduces a unified architecture supporting FSL, CL, and inference with minimal area overhead, and demonstrates end-to-end on-chip learning on raw audio data.
Findings
Achieves 96.8% accuracy on Omniglot 5-way 1-shot learning
Sets new records for end-to-end on-chip FSL and CL accuracy
Maintains 93.3% inference accuracy on Google Speech Commands at 3.1 μW
Abstract
On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power end-to-end on-chip learning, especially from real-world sequential data with a limited number of examples, is an open challenge. Indeed, accelerators supporting error backpropagation optimize for learning performance at the expense of inference efficiency, while simplified learning algorithms often fail to reach acceptable accuracy targets. In this work, we present Chameleon, leveraging three key contributions to solve these challenges. (i) A unified learning and inference architecture supports few-shot learning (FSL), continual learning (CL) and inference at only 0.5% area overhead to the inference logic. (ii) Long temporal dependencies are efficiently captured with temporal convolutional networks (TCNs),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
