Agent-based Video Trimming

Lingfeng Yang; Zhenyuan Chen; Xiang Li; Peiyang Jia; Liangqu Long,; Jian Yang

arXiv:2412.09513·cs.CV·December 13, 2024

Agent-based Video Trimming

Lingfeng Yang, Zhenyuan Chen, Xiang Li, Peiyang Jia, Liangqu Long,, Jian Yang

PDF

Open Access

TL;DR

This paper introduces a novel agent-based approach for video trimming that detects wasted footage, selects valuable segments, and arranges them into a coherent story, improving video summarization and highlight detection.

Contribution

It proposes a new Video Trimming task and an agent-based framework with structured phases, including video structuring, filtering, and story composition, along with a new benchmark dataset.

Findings

01

AVT outperforms existing methods in user evaluations.

02

Demonstrates superior mAP and precision on multiple datasets.

03

Introduces a new benchmark dataset for video trimming.

Abstract

As information becomes more accessible, user-generated videos are increasing in length, placing a burden on viewers to sift through vast content for valuable insights. This trend underscores the need for an algorithm to extract key video information efficiently. Despite significant advancements in highlight detection, moment retrieval, and video summarization, current approaches primarily focus on selecting specific time intervals, often overlooking the relevance between segments and the potential for segment arranging. In this paper, we introduce a novel task called Video Trimming (VT), which focuses on detecting wasted footage, selecting valuable segments, and composing them into a final video with a coherent story. To address this task, we propose Agent-based Video Trimming (AVT), structured into three phases: Video Structuring, Clip Filtering, and Story Composition. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Multimedia Communication and Technology

MethodsContrastive Language-Image Pre-training · Focus