Compare and Select: Video Summarization with Multi-Agent Reinforcement Learning
Tianyu Liu

TL;DR
This paper introduces CoSNet, a multi-agent reinforcement learning framework for video summarization that models user-like decision processes, effectively handling subjectivity and outperforming existing methods.
Contribution
It proposes a novel multi-agent reinforcement learning approach inspired by user behavior, combining comparison and selection networks for improved video summarization.
Findings
Outperforms state-of-the-art unsupervised methods
Surpasses most supervised methods with full rewards
Effective in modeling subjective user preferences
Abstract
Video summarization aims at generating concise video summaries from the lengthy videos, to achieve better user watching experience. Due to the subjectivity, purely supervised methods for video summarization may bring the inherent errors from the annotations. To solve the subjectivity problem, we study the general user summarization process. General users usually watch the whole video, compare interesting clips and select some clips to form a final summary. Inspired by the general user behaviours, we formulate the summarization process as multiple sequential decision-making processes, and propose Comparison-Selection Network (CoSNet) based on multi-agent reinforcement learning. Each agent focuses on a video clip and constantly changes its focus during the iterations, and the final focus clips of all agents form the summary. The comparison network provides the agent with the visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Music and Audio Processing · Human Motion and Animation
