Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

Jiangkai Wu; Zhiyuan Ren; Liming Liu; Xinggong Zhang

arXiv:2507.10510·cs.NI·November 25, 2025

Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI

Jiangkai Wu, Zhiyuan Ren, Liming Liu, Xinggong Zhang

PDF

TL;DR

This paper introduces AI Video Chat, highlighting its unique challenges in low-latency video streaming for real-time human-AI interaction, and proposes a context-aware streaming method alongside a new benchmark for evaluation.

Contribution

It presents the first benchmark for AI video understanding under degraded conditions and proposes a novel context-aware streaming approach to reduce bitrate while maintaining AI accuracy.

Findings

01

Ultra-low bitrate is crucial for low latency in AI Video Chat.

02

Context-aware video streaming significantly reduces bitrate without sacrificing MLLM accuracy.

03

DeViBench provides a new standard for evaluating degraded video understanding in AI chat systems.

Abstract

AI Video Chat emerges as a new paradigm for Real-time Communication (RTC), where one peer is not a human, but a Multimodal Large Language Model (MLLM). This makes interaction between humans and AI more intuitive, as if chatting face-to-face with a real person. However, this poses significant challenges to latency, because the MLLM inference takes up most of the response time, leaving very little time for video streaming. Due to network uncertainty, transmission latency becomes a critical bottleneck preventing AI from being like a real person. To address this, we call for AI-oriented RTC research, exploring the network requirement shift from "humans watching video" to "AI understanding video". We begin by recognizing the main differences between AI Video Chat and traditional RTC. Then, through prototype measurements, we identify that ultra-low bitrate is a key factor for low latency. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.