CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Shifang Zhao; Yihan Hu; Ying Shan; Yunchao Wei; and Xiaodong Cun

arXiv:2603.29664·cs.CV·April 1, 2026

CutClaw: Agentic Hours-Long Video Editing via Music Synchronization

Shifang Zhao, Yihan Hu, Ying Shan, Yunchao Wei, and Xiaodong Cun

PDF

1 Repo

TL;DR

CutClaw is an autonomous multi-agent system that efficiently creates short, rhythm-aligned videos from hours-long footage by leveraging multimodal models and hierarchical decomposition.

Contribution

It introduces a novel multi-agent framework with hierarchical multimodal decomposition for automated, narrative-consistent video editing synchronized with music.

Findings

01

Outperforms state-of-the-art baselines in video quality and rhythm alignment.

02

Effectively captures both fine-grained details and global structures in videos.

03

Demonstrates the potential for autonomous, long-form video editing.

Abstract

Editing the video content with audio alignment forms a digital human-made art in current social media. However, the time-consuming and repetitive nature of manual video editing has long been a challenge for filmmakers and professional content creators alike. In this paper, we introduce CutClaw, an autonomous multi-agent framework designed to edit hours-long raw footage into meaningful short videos that leverages the capabilities of multiple Multimodal Language Models~(MLLMs) as an agent system. It produces videos with synchronized music, followed by instructions, and a visually appealing appearance. In detail, our approach begins by employing a hierarchical multimodal decomposition that captures both fine-grained details and global structures across visual and audio footage. Then, to ensure narrative consistency, a Playwriter Agent orchestrates the whole storytelling flow and structures…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GVCLab/CutClaw
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.