A novel learning-based frame pooling method for Event Detection
Lan Wang, Chenqiang Gao, Jiang Liu, Deyu Meng

TL;DR
This paper introduces a learning-based frame pooling method for event detection in videos, which automatically learns optimal pooling weights for different event categories, improving detection accuracy over traditional pooling methods.
Contribution
It proposes a novel optimization-based pooling approach that adapts pooling weights to specific event categories, enhancing video event detection performance.
Findings
Outperforms average and max pooling strategies on TRECVID MED 2011
Automatically learns optimal pooling weights for each event category
Improves detection accuracy for complex video events
Abstract
Detecting complex events in a large video collection crawled from video websites is a challenging task. When applying directly good image-based feature representation, e.g., HOG, SIFT, to videos, we have to face the problem of how to pool multiple frame feature representations into one feature representation. In this paper, we propose a novel learning-based frame pooling method. We formulate the pooling weight learning as an optimization problem and thus our method can automatically learn the best pooling weight configuration for each specific event category. Experimental results conducted on TRECVID MED 2011 reveal that our method outperforms the commonly used average pooling and max pooling strategies on both high-level and low-level 2D image features.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
MethodsAverage Pooling
