AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Haojie Zheng; Shuchen Weng; Jingqi Liu; Siqi Yang; Boxin Shi; Xinlong Wang

arXiv:2512.10571·cs.CV·May 5, 2026

AVI-Edit: Audio-sync Video Instance Editing with Granularity-Aware Mask Refiner

Haojie Zheng, Shuchen Weng, Jingqi Liu, Siqi Yang, Boxin Shi, Xinlong Wang

PDF

1 Repo 1 Models

TL;DR

AVI-Edit is a novel framework for precise, audio-synchronized video instance editing that uses a granularity-aware mask refiner and a self-feedback audio agent, outperforming existing methods.

Contribution

The paper introduces AVI-Edit, featuring a mask refiner and audio agent, along with a new dataset, enabling fine-grained, synchronized video editing at the instance level.

Findings

01

AVI-Edit achieves superior visual quality compared to state-of-the-art methods.

02

AVI-Edit demonstrates improved audio-visual synchronization.

03

AVI-Edit provides fine-grained spatial and temporal control for video editing.

Abstract

Recent advancements in video generation highlight that realistic audio-visual synchronization is crucial for engaging content creation. However, existing video editing methods largely overlook audio-visual synchronization and lack the fine-grained spatial and temporal controllability required for precise instance-level edits. In this paper, we propose AVI-Edit, a framework for audio-sync video instance editing. We propose a granularity-aware mask refiner that iteratively refines coarse user-provided masks into precise instance-level regions. We further design a self-feedback audio agent to curate high-quality audio guidance, providing fine-grained temporal control. To facilitate this task, we additionally construct a large-scale dataset with instance-centric correspondence and comprehensive annotations. Extensive experiments demonstrate that AVI-Edit outperforms state-of-the-art methods…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://hjzheng.net/projects/AVI-Edit
github

Models

🤗
suimu/AVI-Edit
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.