End-to-End Compressed Video Representation Learning for Generic Event   Boundary Detection

Congcong Li; Xinyao Wang; Longyin Wen; Dexiang Hong; Tiejian Luo; Libo; Zhang

arXiv:2203.15336·cs.CV·March 30, 2022

End-to-End Compressed Video Representation Learning for Generic Event Boundary Detection

Congcong Li, Xinyao Wang, Longyin Wen, Dexiang Hong, Tiejian Luo, Libo, Zhang

PDF

Open Access

TL;DR

This paper introduces an end-to-end compressed video representation learning approach for generic event boundary detection that operates directly in the compressed domain, significantly reducing computational costs while maintaining competitive accuracy.

Contribution

It proposes a novel method leveraging compressed video data, including motion vectors and residuals, for event boundary detection without full decoding, enhancing efficiency.

Findings

01

Achieves comparable accuracy to state-of-the-art methods.

02

Runs 4.5 times faster than existing approaches.

03

Effectively utilizes compressed domain information for boundary detection.

Abstract

Generic event boundary detection aims to localize the generic, taxonomy-free event boundaries that segment videos into chunks. Existing methods typically require video frames to be decoded before feeding into the network, which demands considerable computational power and storage space. To that end, we propose a new end-to-end compressed video representation learning for event boundary detection that leverages the rich information in the compressed domain, i.e., RGB, motion vectors, residuals, and the internal group of pictures (GOP) structure, without fully decoding the video. Specifically, we first use the ConvNets to extract features of the I-frames in the GOPs. After that, a light-weight spatial-channel compressed encoder is designed to compute the feature representations of the P-frames based on the motion vectors, residuals and representations of their dependent I-frames. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings