Rethinking the Architecture Design for Efficient Generic Event Boundary Detection
Ziwei Zheng, Zechuan Zhang, Yulin Wang, Shiji Song, Gao Huang, Le Yang

TL;DR
This paper reexamines GEBD model architectures, revealing that simpler models and video-domain backbones can significantly improve efficiency and performance, encouraging resource-aware design.
Contribution
It introduces EfficientGEBD, a family of models that outperform state-of-the-art methods in both accuracy and speed by modernizing architecture components and using video-domain backbones.
Findings
A simple baseline model achieves promising performance without complex design.
Video-domain backbones with joint spatiotemporal modeling outperform image-domain backbones.
EfficientGEBD models outperform previous SOTA by up to 1.7% accuracy and 280% speedup.
Abstract
Generic event boundary detection (GEBD), inspired by human visual cognitive behaviors of consistently segmenting videos into meaningful temporal chunks, finds utility in various applications such as video editing and. In this paper, we demonstrate that SOTA GEBD models often prioritize final performance over model complexity, resulting in low inference speed and hindering efficient deployment in real-world scenarios. We contribute to addressing this challenge by experimentally reexamining the architecture of GEBD models and uncovering several surprising findings. Firstly, we reveal that a concise GEBD baseline model already achieves promising performance without any sophisticated design. Secondly, we find that the widely applied image-domain backbones in GEBD models can contain plenty of architecture redundancy, motivating us to gradually ``modernize'' each component to enhance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Network Security and Intrusion Detection · Distributed systems and fault tolerance
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
