Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Linyi Jiang, Silvery D. Fu, Yifei Zhu, Bo Li

TL;DR
Janus is a novel framework enabling low-latency, collaborative Vision Transformer inference on cloud and edge devices over dynamic networks, balancing accuracy, latency, and communication costs.
Contribution
It introduces a dynamic, collaborative ViT inference method combining token pruning and model splitting to optimize performance under fluctuating network conditions.
Findings
Increases throughput by up to 5.15 times
Reduces latency violation ratios by up to 98.7%
Balances accuracy and latency effectively
Abstract
Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks. Since ViTs are computationally expensive, the models either have to be pruned to run on resource-limited edge devices only or have to be executed on remote cloud servers after receiving the raw data transmitted over fluctuating networks. The resulting degraded performance or high latency all hinder their widespread applications. In this paper, we present Janus, the first framework for low-latency cloud-device collaborative Vision Transformer inference over dynamic networks. Janus overcomes the intrinsic model limitations of ViTs and realizes collaboratively executing ViT models on both cloud and edge devices, achieving low latency, high accuracy, and low communication overhead. Specifically, Janus judiciously combines…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModular Robots and Swarm Intelligence · Teleoperation and Haptic Systems · Robotics and Automated Systems
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax
