TL;DR
This paper introduces a dynamic convolution approach for modeling the temporal receptive field of concepts in untrimmed videos, enabling adaptive and accurate event recognition by adjusting to input variations.
Contribution
It proposes the temporal dynamic convolution (TDC) and the TDCMN network, which adaptively model concept behaviors over time, significantly improving event recognition in untrimmed videos.
Findings
TDCMN outperforms existing methods on FCVID and ActivityNet datasets.
Adaptive modeling of concept receptive fields enhances event recognition accuracy.
The approach effectively captures complex concept interactions in untrimmed videos.
Abstract
Event analysis in untrimmed videos has attracted increasing attention due to the application of cutting-edge techniques such as CNN. As a well studied property for CNN-based models, the receptive field is a measurement for measuring the spatial range covered by a single feature response, which is crucial in improving the image categorization accuracy. In video domain, video event semantics are actually described by complex interaction among different concepts, while their behaviors vary drastically from one video to another, leading to the difficulty in concept-based analytics for accurate event categorization. To model the concept behavior, we study temporal concept receptive field of concept-based event representation, which encodes the temporal occurrence pattern of different mid-level concepts. Accordingly, we introduce temporal dynamic convolution (TDC) to give stronger flexibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
