Transformer Architecture with Minimal Inference Latency for Multi-Modal Wireless Networks
Minsu Kim, Walid Saad, Kui Wang, Zongdian Li, Tao Yu, Kei Sakaguchi

TL;DR
This paper introduces a fast multi-modal transformer framework for wireless networks that reduces inference latency and memory use by processing only important tokens, enabling real-time environment tracking.
Contribution
A novel token-based inference optimization for multi-modal transformers that significantly reduces latency and resource consumption in wireless communication tasks.
Findings
Reduced inference latency by 86.2% on beamforming tasks.
Lowered GPU memory usage by 35%.
Achieved 80% reduction in FLOPs with negligible accuracy loss.
Abstract
Next-generation wireless networks are expected to leverage multi-modal data sources to execute various wireless communication tasks such as beamforming and blockage prediction with situational-awareness. To do so, multi-modal transformers emerged as an effective tool, however, existing transformer-based approaches suffer from high inference latency and large memory footprints when processing multi-modal data. Hence, such existing solutions cannot handle wireless communication tasks that require fast inference to track a dynamically changing environment with moving vehicles and blockages. One major bottleneck is the reliance on attention mechanisms whose complexity grows quadratically with respect to the number of tokens. Hence, in this paper, a novel, fast multi-modal transformer inference framework is designed to practically support wireless communication tasks by processing only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
