Hyperion: Low-Latency Ultra-HD Video Analytics via Collaborative Vision Transformer Inference
Linyi Jiang, Yifei Zhu, Hao Yin, Bo Li

TL;DR
Hyperion is a collaborative cloud-device framework that enables low-latency Ultra-HD video analytics using vision transformers, balancing computational load and transmission to improve speed and accuracy in dynamic network conditions.
Contribution
Hyperion introduces a novel collaborative approach with importance scoring, adaptive scheduling, and result fusion to optimize Ultra-HD transformer inference across cloud and device.
Findings
Increases frame processing rate by up to 1.61x
Improves accuracy by up to 20.2%
Effective under various network conditions
Abstract
Recent advancements in array-camera videography enable real-time capturing of ultra-high-definition (Ultra-HD) videos, providing rich visual information in a large field of view. However, promptly processing such data using state-of-the-art transformer-based vision foundation models faces significant computational overhead in on-device computing or transmission overhead in cloud computing. In this paper, we present Hyperion, the first cloud-device collaborative framework that enables low-latency inference on Ultra-HD vision data using off-the-shelf vision transformers over dynamic networks. Hyperion addresses the computational and transmission bottleneck of Ultra-HD vision transformers by exploiting the intrinsic property in vision Transformer models. Specifically, Hyperion integrates a collaboration-aware importance scorer that identifies critical regions at the patch level, a dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Coding and Compression Technologies · Image and Video Quality Assessment · Advanced Image Processing Techniques
