Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices
Alexander Gr\"afe, Ding Huo, Vincent de Bakker, Johannes Berger, Marco Zimmerling, Sebastian Trimpe

TL;DR
This paper introduces CATS, a distributed transformer inference framework enabling ultra-low-power wireless devices to collaboratively run large models, overcoming individual device limitations through communication-aware partitioning and robustness techniques.
Contribution
The paper presents CATS, a novel framework for distributed transformer inference on ultra-low-power devices, including a new communication primitive and robust training methods.
Findings
Distributed inference enables models 14 times larger than single devices.
CATS reduces communication bandwidth and RAM usage significantly.
Real-world deployment on 16 devices demonstrates practical feasibility.
Abstract
Transformer models are rapidly becoming a cornerstone of modern Internet of Things (IoT) applications, yet their computational and memory demands far exceed the capabilities of a single typical ultra-low-power IoT device. We present CATS, a framework for distributed transformer inference on ultra-low-power wireless devices, enabling multiple devices to collaboratively execute models far larger than what a single device can sustain. At its core, CATS is a communication-aware distributed transformer inference scheme co-designed across transformer partitioning, wireless communication and training. It employs SomeGather, a new pruned communication primitive that selectively broadcasts activation columns to reduce communication bandwidth and RAM usage without sacrificing model accuracy. Building on SomeGather, we design a partitioning method that exploits this primitive for efficient model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
