Generic Event Boundary Detection in Video with Pyramid Features
Van Thong Huynh, Hyung-Jeong Yang, Guee-Sang Lee, Soo-Hyung Kim

TL;DR
This paper introduces a novel method for generic event boundary detection in videos using pyramid features that analyze frame similarities across spatial and temporal dimensions, outperforming existing approaches.
Contribution
The study proposes a new framework leveraging pyramid feature maps and a similarity-based decoding process for improved event boundary detection in videos.
Findings
Outperforms state-of-the-art on GEBD benchmark
Effective on long-form Olympic sport videos
Utilizes multi-scale spatial-temporal features
Abstract
Generic event boundary detection (GEBD) aims to split video into chunks at a broad and diverse set of actions as humans naturally perceive event boundaries. In this study, we present an approach that considers the correlation between neighbor frames with pyramid feature maps in both spatial and temporal dimensions to construct a framework for localizing generic events in video. The features at multiple spatial dimensions of a pre-trained ResNet-50 are exploited with different views in the temporal dimension to form a temporal pyramid feature map. Based on that, the similarity between neighbor frames is calculated and projected to build a temporal pyramid similarity feature vector. A decoder with 1D convolution operations is used to decode these similarities to a new representation that incorporates their temporal relationship for later boundary score estimation. Extensive experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Analysis and Summarization
MethodsConvolution
