AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Xudong Lu; Yang Bo; Jinpeng Chen; Shuhan Li; Xintong Guo; Huankang Guan; Fang Liu; Dunyuan Xu; Peiwen Sun; Heyang Sun; Rui Liu; Hongsheng Li

arXiv:2604.04184·cs.CV·April 7, 2026

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Xudong Lu, Yang Bo, Jinpeng Chen, Shuhan Li, Xintong Guo, Huankang Guan, Fang Liu, Dunyuan Xu, Peiwen Sun, Heyang Sun, Rui Liu, Hongsheng Li

PDF

1 Repo 1 Models

TL;DR

AURA is an end-to-end streaming visual interaction framework that enables continuous video understanding and real-time assistance, supporting open-ended questions and proactive responses with state-of-the-art performance.

Contribution

It introduces a unified VideoLLM system for live video streams, integrating context management, data construction, and deployment optimization for stable long-horizon interaction.

Findings

01

Achieves state-of-the-art performance on streaming benchmarks.

02

Supports real-time question answering and proactive responses.

03

Runs at 2 FPS on high-end accelerators with integrated ASR and TTS.

Abstract

Video Large Language Models (VideoLLMs) have achieved strong performance on many video understanding tasks, but most existing systems remain offline and are not well-suited for live video streams that require continuous observation and timely response. Recent streaming VideoLLMs have made progress, yet current approaches often rely on decoupled trigger-response pipelines or are limited to captioning-style narration, reducing their effectiveness for open-ended question answering and long-horizon interaction. We propose AURA (Always-On Understanding and Real-Time Assistance), an end-to-end streaming visual interaction framework that enables a unified VideoLLM to continuously process video streams and support both real-time question answering and proactive responses. AURA integrates context management, data construction, training objectives, and deployment optimization for stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aurateam2026/AURA
github

Models

🤗
aurateam/AURA
model· 347 dl· ♡ 12
347 dl♡ 12

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.