Sound Sparks Motion: Audio and Text Tuning for Video Editing

AmirHossein Naghi Razlighi; Aryan Mikaeili; Ali Mahdavi-Amiri; Daniel Cohen-Or; Yiorgos Chrysanthou

arXiv:2605.15307·cs.GR·May 18, 2026

Sound Sparks Motion: Audio and Text Tuning for Video Editing

AmirHossein Naghi Razlighi, Aryan Mikaeili, Ali Mahdavi-Amiri, Daniel Cohen-Or, Yiorgos Chrysanthou

PDF

1 Repo

TL;DR

Sound Sparks Motion introduces a test-time tuning framework that enables localized motion editing in generative videos by adjusting multimodal conditioning signals, guided by vision-language feedback.

Contribution

It presents a training-free, test-time tuning method that modifies only two variables to achieve motion edits, revealing reusable motion controls in multimodal models.

Findings

01

Effective motion editing guided by vision-language feedback.

02

Transferability of learned motion controls across videos.

03

Highlights multimodal conditioning tuning as a promising approach.

Abstract

Motion-centric video editing remains difficult for large generative video models, which often respond well to appearance changes but struggle to produce specific, localized actions or state transitions in an existing clip. We introduce Sound Sparks Motion, a training-free framework that enables motion editing in an audio-visual video generation model by tuning its internal multimodal conditioning signals at test time. Rather than modifying model weights, our method tunes only two lightweight variables: an audio latent derived from the source video and a residual perturbation in the text-conditioning. We find that this combination can encourage motion edits that the underlying model often struggles to realize under prompt-only control. Since there is no direct way to evaluate temporal alignment between text and motion, we guide the tuning process using a vision-language model that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://amirhossein-razlighi.github.io/Sound_Sparks_Motion
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.