Once-for-All Channel Mixers (HYPERTINYPW): Generative Compression for TinyML
Yassien Shaalan

TL;DR
HYPER-TINYPW introduces a generative compression method for tiny neural network mixers on microcontrollers, significantly reducing memory while maintaining accuracy and runtime performance across biosignal and speech tasks.
Contribution
It proposes a novel compression-as-generation approach that replaces stored weights with generated weights using a shared micro-MLP, enabling efficient TinyML deployment.
Findings
Reduces model size by 6.31x at similar accuracy.
Maintains at least 95% of large-model macro-F1 on ECG benchmarks.
Achieves 96.2% accuracy on Speech Commands with minimal memory.
Abstract
Deploying neural networks on microcontrollers is constrained by kilobytes of flash and SRAM, where 1x1 pointwise (PW) mixers often dominate memory even after INT8 quantization across vision, audio, and wearable sensing. We present HYPER-TINYPW, a compression-as-generation approach that replaces most stored PW weights with generated weights: a shared micro-MLP synthesizes PW kernels once at load time from tiny per-layer codes, caches them, and executes them with standard integer operators. This preserves commodity MCU runtimes and adds only a one-off synthesis cost; steady-state latency and energy match INT8 separable CNN baselines. Enforcing a shared latent basis across layers removes cross-layer redundancy, while keeping PW1 in INT8 stabilizes early, morphology-sensitive mixing. We contribute (i) TinyML-faithful packed-byte accounting covering generator, heads/factorization, codes,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · Low-power high-performance VLSI design
