Motion meets Attention: Video Motion Prompts
Qixiang Chen, Lei Wang, Piotr Koniusz, Tom Gedeon

TL;DR
This paper introduces a novel motion prompt layer that uses attention mechanisms and regularization to extract and highlight relevant motion cues in videos, improving action recognition performance.
Contribution
It proposes a learnable attention-based motion prompt layer with regularization, enabling better motion feature extraction in video models.
Findings
Enhanced action recognition accuracy on benchmarks
Seamless integration with existing models like SlowFast and TimeSformer
Effective suppression of noise in motion signals
Abstract
Videos contain rich spatio-temporal information. Traditional methods for extracting motion, used in tasks such as action recognition, often rely on visual contents rather than precise motion features. This phenomenon is referred to as 'blind motion extraction' behavior, which proves inefficient in capturing motions of interest due to a lack of motion-guided cues. Recently, attention mechanisms have enhanced many computer vision tasks by effectively highlighting salient visual areas. Inspired by this, we propose a modified Sigmoid function with learnable slope and shift parameters as an attention mechanism to modulate motion signals from frame differencing maps. This approach generates a sequence of attention maps that enhance the processing of motion-related video content. To ensure temporal continuity and smoothness of the attention maps, we apply pair-wise temporal attention variation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCinema and Media Studies · Data Visualization and Analytics
MethodsSoftmax · Attention Is All You Need · TimeSformer · Adapter
