Local Compressed Video Stream Learning for Generic Event Boundary   Detection

Libo Zhang; Xin Gu; Congcong Li; Tiejian Luo; Heng Fan

arXiv:2309.15431·cs.CV·September 28, 2023

Local Compressed Video Stream Learning for Generic Event Boundary Detection

Libo Zhang, Xin Gu, Congcong Li, Tiejian Luo, Heng Fan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel end-to-end compressed video representation learning method for generic event boundary detection that leverages compressed domain information, reducing computational demands while improving detection accuracy.

Contribution

The proposed method uniquely utilizes compressed video data and local temporal modeling with LSTM and attention modules for efficient boundary detection.

Findings

01

Achieves significant accuracy improvements over previous methods.

02

Operates efficiently without full video decoding.

03

Demonstrates robustness on Kinetics-GEBD and TAPOS datasets.

Abstract

Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks. Existing methods typically require video frames to be decoded before feeding into the network, which contains significant spatio-temporal redundancy and demands considerable computational power and storage space. To remedy these issues, we propose a novel compressed video representation learning method for event boundary detection that is fully end-to-end leveraging rich information in the compressed domain, i.e., RGB, motion vectors, residuals, and the internal group of pictures (GOP) structure, without fully decoding the video. Specifically, we use lightweight ConvNets to extract features of the P-frames in the GOPs and spatial-channel attention module (SCAM) is designed to refine the feature representations of the P-frames based on the compressed information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gx77/lcvsl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings