BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation
Haisheng Su, Weihao Gan, Wei Wu, Yu Qiao, Junjie Yan

TL;DR
BSN++ introduces a novel framework for temporal action proposal generation that combines a boundary regressor with relation modeling, significantly improving boundary precision and proposal quality in untrimmed videos.
Contribution
The paper proposes a new boundary regressor with a U-shaped architecture and bi-directional matching, along with a proposal relation block using self-attention, and a scale-balanced re-sampling strategy for better performance.
Findings
Achieves state-of-the-art results on ActivityNet-1.3 and THUMOS14.
Ranks 1st in the CVPR19 ActivityNet challenge.
Demonstrates improved boundary precision and proposal quality.
Abstract
Generating human action proposals in untrimmed videos is an important yet challenging task with wide applications. Current methods often suffer from the noisy boundary locations and the inferior quality of confidence scores used for proposal retrieving. In this paper, we present BSN++, a new framework which exploits complementary boundary regressor and relation modeling for temporal proposal generation. First, we propose a novel boundary regressor based on the complementary characteristics of both starting and ending boundary classifiers. Specifically, we utilize the U-shaped architecture with nested skip connections to capture rich contexts and introduce bi-directional boundary matching mechanism to improve boundary precision. Second, to account for the proposal-proposal relations ignored in previous methods, we devise a proposal relation block to which includes two self-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Video Analysis and Summarization
