Query Twice: Dual Mixture Attention Meta Learning for Video Summarization
Junyan Wang, Yang Bai, Yang Long, Bingzhang Hu, Zhenhua Chai, Yu Guan, and Xiaolin Wei

TL;DR
This paper introduces DMASum, a novel video summarization framework that uses dual mixture attention and meta learning to overcome the softmax bottleneck, improving generalization and capturing complex visual and sequential information.
Contribution
The paper proposes a dual mixture attention model with meta learning to enhance video summarization by increasing model capacity and generalization on small datasets.
Findings
Significant improvements over state-of-the-art methods on SumMe and TVSum datasets.
Effective capture of second-order changes in attention mechanisms.
Enhanced generalization through a novel meta learning rule.
Abstract
Video summarization aims to select representative frames to retain high-level information, which is usually solved by predicting the segment-wise importance score via a softmax function. However, softmax function suffers in retaining high-rank representations for complex visual or sequential information, which is known as the Softmax Bottleneck problem. In this paper, we propose a novel framework named Dual Mixture Attention (DMASum) model with Meta Learning for video summarization that tackles the softmax bottleneck problem, where the Mixture of Attention layer (MoA) effectively increases the model capacity by employing twice self-query attention that can capture the second-order changes in addition to the initial query-key attention, and a novel Single Frame Meta Learning rule is then introduced to achieve more generalization to small datasets with limited training sources.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax
