Making Every Frame Matter: Continuous Activity Recognition in Streaming   Video via Adaptive Video Context Modeling

Hao Wu; Donglin Bai; Shiqi Jiang; Qianxi Zhang; Yifan Yang; Xin Ding,; Ting Cao; Yunxin Liu; Fengyuan Xu

arXiv:2410.14993·cs.CV·March 14, 2025

Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling

Hao Wu, Donglin Bai, Shiqi Jiang, Qianxi Zhang, Yifan Yang, Xin Ding,, Ting Cao, Yunxin Liu, Fengyuan Xu

PDF

Open Access

TL;DR

This paper introduces CARS, a system for continuous video activity recognition that adaptively models video context, achieving high accuracy and speed on edge devices while effectively handling multi-scale, untrimmed streaming videos.

Contribution

The paper presents a novel adaptive video context modeling approach with activity spatial feature extraction and dynamic state updates, improving recognition accuracy and efficiency.

Findings

01

CARS runs at over 30 FPS on edge devices.

02

Outperforms baselines by up to 79.7% in accuracy.

03

Enhances performance on in-distribution and zero-shot datasets.

Abstract

Video activity recognition has become increasingly important in robots and embodied AI. Recognizing continuous video activities poses considerable challenges due to the fast expansion of streaming video, which contains multi-scale and untrimmed activities. We introduce a novel system, CARS, to overcome these issues through adaptive video context modeling. Adaptive video context modeling refers to selectively maintaining activity-related features in temporal and spatial dimensions. CARS has two key designs. The first is an activity spatial feature extraction by eliminating irrelevant visual features while maintaining recognition accuracy. The second is an activity-aware state update introducing dynamic adaptability to better preserve the video context for multi-scale activity recognition. Our CARS runs at speeds $>$ 30 FPS on typical edge devices and outperforms all baselines by 1.2\% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics