Artic: AI-oriented Real-time Communication for MLLM Video Assistant
Jiangkai Wu, Zhiyuan Ren, Junquan Zhong, Liming Liu, Xinggong Zhang

TL;DR
Artic introduces an AI-oriented RTC framework for MLLM Video Assistants that enhances response accuracy and reduces latency by adaptive bitrate management and targeted streaming, addressing current system limitations.
Contribution
The paper presents Artic, a novel RTC framework tailored for MLLM Video Assistants, with new adaptive bitrate and streaming techniques optimized for AI understanding of video content.
Findings
Improves MLLM accuracy by 15.12%
Reduces latency by 135.31 ms
Introduces the first degraded video understanding benchmark
Abstract
AI Video Assistant emerges as a new paradigm for Real-time Communication (RTC), where one peer is a Multimodal Large Language Model (MLLM) deployed in the cloud. This makes interaction between humans and AI more intuitive, akin to chatting with a real person. However, a fundamental mismatch exists between current RTC frameworks and AI Video Assistants, stemming from the drastic shift in Quality of Experience (QoE) and more challenging networks. Measurements on our production prototype also confirm that current RTC fails, causing latency spikes and accuracy drops. To address these challenges, we propose Artic, an AI-oriented RTC framework for MLLM Video Assistants, exploring the shift from "humans watching video" to "AI understanding video." Specifically, Artic proposes: (1) Response Capability-aware Adaptive Bitrate, which utilizes MLLM accuracy saturation to proactively cap bitrate,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Video Quality Assessment · Advanced Data and IoT Technologies · Advanced Neural Network Applications
