Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu, Donglin Bai, Shiqi Jiang, Qianxi Zhang, Yifan Yang, Xin Ding,, Ting Cao, Yunxin Liu, Fengyuan Xu

TL;DR
This paper introduces CARS, a system for continuous video activity recognition that adaptively models video context, achieving high accuracy and speed on edge devices while effectively handling multi-scale, untrimmed streaming videos.
Contribution
The paper presents a novel adaptive video context modeling approach with activity spatial feature extraction and dynamic state updates, improving recognition accuracy and efficiency.
Findings
CARS runs at over 30 FPS on edge devices.
Outperforms baselines by up to 79.7% in accuracy.
Enhances performance on in-distribution and zero-shot datasets.
Abstract
Video activity recognition has become increasingly important in robots and embodied AI. Recognizing continuous video activities poses considerable challenges due to the fast expansion of streaming video, which contains multi-scale and untrimmed activities. We introduce a novel system, CARS, to overcome these issues through adaptive video context modeling. Adaptive video context modeling refers to selectively maintaining activity-related features in temporal and spatial dimensions. CARS has two key designs. The first is an activity spatial feature extraction by eliminating irrelevant visual features while maintaining recognition accuracy. The second is an activity-aware state update introducing dynamic adaptability to better preserve the video context for multi-scale activity recognition. Our CARS runs at speeds 30 FPS on typical edge devices and outperforms all baselines by 1.2\% to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
