Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection
Jiaqi Tang, Zhaoyang Liu, Chen Qian, Wayne Wu, Limin Wang

TL;DR
This paper introduces DDM-Net, an end-to-end framework for generic event boundary detection in videos, leveraging multi-level features, dense difference maps, and progressive attention to improve motion and appearance modeling.
Contribution
The paper proposes a novel dense difference map approach with progressive attention on multi-level features, enhancing temporal modeling for event boundary detection.
Findings
Achieved 14% improvement on Kinetics-GEBD benchmark.
Achieved 8% improvement on TAPOS benchmark.
Outperformed top-1 solution of LOVEU Challenge@CVPR 2021.
Abstract
Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries. The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). To tackle the diversity and complicated semantics of event boundaries, we make three notable improvements. First, we construct a feature bank to store multi-level features of space and time, prepared for difference calculation at multiple scales. Second, to alleviate inadequate temporal modeling of previous methods, we present dense difference maps (DDM) to comprehensively characterize the motion pattern. Finally, we exploit progressive attention on multi-level DDM to jointly aggregate appearance and motion clues. As a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Generative Adversarial Networks and Image Synthesis
