Progressive Attention on Multi-Level Dense Difference Maps for Generic   Event Boundary Detection

Jiaqi Tang; Zhaoyang Liu; Chen Qian; Wayne Wu; Limin Wang

arXiv:2112.04771·cs.CV·April 4, 2022·1 cites

Progressive Attention on Multi-Level Dense Difference Maps for Generic Event Boundary Detection

Jiaqi Tang, Zhaoyang Liu, Chen Qian, Wayne Wu, Limin Wang

PDF

Open Access 3 Repos

TL;DR

This paper introduces DDM-Net, an end-to-end framework for generic event boundary detection in videos, leveraging multi-level features, dense difference maps, and progressive attention to improve motion and appearance modeling.

Contribution

The paper proposes a novel dense difference map approach with progressive attention on multi-level features, enhancing temporal modeling for event boundary detection.

Findings

01

Achieved 14% improvement on Kinetics-GEBD benchmark.

02

Achieved 8% improvement on TAPOS benchmark.

03

Outperformed top-1 solution of LOVEU Challenge@CVPR 2021.

Abstract

Generic event boundary detection is an important yet challenging task in video understanding, which aims at detecting the moments where humans naturally perceive event boundaries. The main challenge of this task is perceiving various temporal variations of diverse event boundaries. To this end, this paper presents an effective and end-to-end learnable framework (DDM-Net). To tackle the diversity and complicated semantics of event boundaries, we make three notable improvements. First, we construct a feature bank to store multi-level features of space and time, prepared for difference calculation at multiple scales. Second, to alleviate inadequate temporal modeling of previous methods, we present dense difference maps (DDM) to comprehensively characterize the motion pattern. Finally, we exploit progressive attention on multi-level DDM to jointly aggregate appearance and motion clues. As a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Generative Adversarial Networks and Image Synthesis