TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference
Alessio Burrello, Alberto Dequino, Daniele Jahier Pagliari, Francesco, Conti, Marcello Zanghieri, Enrico Macii, Luca Benini, Massimo Poncino

TL;DR
This paper presents an automated optimization approach for deploying Temporal Convolutional Networks on ultra-low power microcontrollers, significantly reducing latency and energy consumption for time-series analysis.
Contribution
It introduces a layer tiling optimizer and optimized kernels for efficient TCN mapping on PULP microcontrollers, outperforming existing toolkits and approaches.
Findings
Up to 103X lower latency compared to Cube-AI on STM32L4.
Up to 20.3X lower energy consumption than Cube-AI.
Energy savings of 2.9X to 26.6X over other approaches.
Abstract
Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Learning models for Time Series analysis. We introduce an automated exploration approach and a library of optimized kernels to map TCNs on Parallel Ultra-Low Power (PULP) microcontrollers. Our approach minimizes latency and energy by exploiting a layer tiling optimizer to jointly find the tiling dimensions and select among alternative implementations of the causal and dilated 1D-convolution operations at the core of TCNs. We benchmark our approach on a commercial PULP device, achieving up to 103X lower latency and 20.3X lower energy than the Cube-AI toolkit executed on the STM32L4 and from 2.9X to 26.6X lower energy compared to commercial closed-source and academic open-source approaches on the same hardware target.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Advancements in PLL and VCO Technologies
