Adaptive Swin Transformer Partitioning over AI-RAN Networks

Tam Thanh Nguyen; Yong Hao Pua; Tuan Van Ngo; Mao V. Ngo; Jihong Park; Binbin Chen; and Tony Q. S. Quek

arXiv:2604.23554·cs.NI·April 28, 2026

Adaptive Swin Transformer Partitioning over AI-RAN Networks

Tam Thanh Nguyen, Yong Hao Pua, Tuan Van Ngo, Mao V. Ngo, Jihong Park, Binbin Chen, and Tony Q. S. Quek

PDF

TL;DR

This paper explores transformer-based split inference for real-time video detection over 5G networks, introducing adaptive partitioning, an activation compression pipeline, and end-to-end system validation on a real testbed.

Contribution

It extends throughput-aware adaptive splitting to Swin Transformers, introduces an activation compression method, and demonstrates a complete system for efficient, real-time inference over AI-RAN networks without retraining.

Findings

01

Achieved practical split execution for transformer-based vision models.

02

Reduced uplink payload significantly with activation compression.

03

Validated end-to-end system performance on a real-time detection workload.

Abstract

This paper demonstrates the feasibility of transformer-based split inference for real-time video object detection over dynamic 5G AI-RAN networks. We extend throughput-aware adaptive splitting from CNNs to a Swin Transformer backbone and show that practical split execution is achievable for transformer-based vision models without retraining. To address the large intermediate activations inherent to transformers, we introduce an efficient, accuracy-preserving activation compression pipeline that substantially reduces uplink payload. The complete system -- including adaptive split selection, transformer inference, and compression -- is implemented and validated end-to-end on a real-time detection workload, with distributed UPF (dUPF) integration further reducing user-plane latency and improving runtime stability. Extensive measurements on an NVIDIA Aerial-based AI-RAN testbed jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.