Janus: Collaborative Vision Transformer Under Dynamic Network   Environment

Linyi Jiang; Silvery D. Fu; Yifei Zhu; Bo Li

arXiv:2502.10047·cs.DC·February 17, 2025

Janus: Collaborative Vision Transformer Under Dynamic Network Environment

Linyi Jiang, Silvery D. Fu, Yifei Zhu, Bo Li

PDF

Open Access

TL;DR

Janus is a novel framework enabling low-latency, collaborative Vision Transformer inference on cloud and edge devices over dynamic networks, balancing accuracy, latency, and communication costs.

Contribution

It introduces a dynamic, collaborative ViT inference method combining token pruning and model splitting to optimize performance under fluctuating network conditions.

Findings

01

Increases throughput by up to 5.15 times

02

Reduces latency violation ratios by up to 98.7%

03

Balances accuracy and latency effectively

Abstract

Vision Transformers (ViTs) have outperformed traditional Convolutional Neural Network architectures and achieved state-of-the-art results in various computer vision tasks. Since ViTs are computationally expensive, the models either have to be pruned to run on resource-limited edge devices only or have to be executed on remote cloud servers after receiving the raw data transmitted over fluctuating networks. The resulting degraded performance or high latency all hinder their widespread applications. In this paper, we present Janus, the first framework for low-latency cloud-device collaborative Vision Transformer inference over dynamic networks. Janus overcomes the intrinsic model limitations of ViTs and realizes collaboratively executing ViT models on both cloud and edge devices, achieving low latency, high accuracy, and low communication overhead. Specifically, Janus judiciously combines…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModular Robots and Swarm Intelligence · Teleoperation and Haptic Systems · Robotics and Automated Systems

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax