Structured Context Learning for Generic Event Boundary Detection

Xin Gu; Congcong Li; Xinyao Wang; Dexiang Hong; Libo Zhang; Tiejian Luo; Longyin Wen; Heng Fan

arXiv:2512.00475·cs.CV·December 2, 2025

Structured Context Learning for Generic Event Boundary Detection

Xin Gu, Congcong Li, Xinyao Wang, Dexiang Hong, Libo Zhang, Tiejian Luo, Longyin Wen, Heng Fan

PDF

Open Access

TL;DR

This paper introduces Structured Context Learning with SPoS for efficient and accurate generic event boundary detection in videos, outperforming existing methods across multiple datasets.

Contribution

It proposes a novel Structured Partition of Sequence (SPoS) method that provides structured context for temporal learning, improving speed and accuracy in event boundary detection.

Findings

01

Achieves better speed-accuracy trade-off than prior methods

02

Demonstrates superior performance on Kinetics-GEBD, TAPOS, and shot transition datasets

03

SPoS's complexity is linear with video length

Abstract

Generic Event Boundary Detection (GEBD) aims to identify moments in videos that humans perceive as event boundaries. This paper proposes a novel method for addressing this task, called Structured Context Learning, which introduces the Structured Partition of Sequence (SPoS) to provide a structured context for learning temporal information. Our approach is end-to-end trainable and flexible, not restricted to specific temporal models like GRU, LSTM, and Transformers. This flexibility enables our method to achieve a better speed-accuracy trade-off. Specifically, we apply SPoS to partition the input frame sequence and provide a structured context for the subsequent temporal model. Notably, SPoS's overall computational complexity is linear with respect to the video length. We next calculate group similarities to capture differences between frames, and a lightweight fully convolutional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Generative Adversarial Networks and Image Synthesis