Agent-based Video Trimming
Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long,, Jian Yang

TL;DR
This paper introduces a novel agent-based approach for video trimming that detects wasted footage, selects valuable segments, and arranges them into a coherent story, improving video summarization and highlight detection.
Contribution
It proposes a new Video Trimming task and an agent-based framework with structured phases, including video structuring, filtering, and story composition, along with a new benchmark dataset.
Findings
AVT outperforms existing methods in user evaluations.
Demonstrates superior mAP and precision on multiple datasets.
Introduces a new benchmark dataset for video trimming.
Abstract
As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video information efficiently. Despite significant advancements in highlight detection, moment retrieval, and video summarization, current approaches primarily focus on selecting specific time intervals, often overlooking the relevance between segments and the potential for segment arranging. In this paper, we introduce a novel task called Video Trimming (VT), which focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story. To address this task, we propose Agent-based Video Trimming (AVT), structured into three phases: Video Structuring, Clip Filtering, and Story Composition. Specifically, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Multimedia Communication and Technology
MethodsContrastive Language-Image Pre-training · Focus
