PAVAS: Physics-Aware Video-to-Audio Synthesis

Oh Hyun-Bin; Yuhta Takida; Toshimitsu Uesaka; Tae-Hyun Oh; Yuki Mitsufuji

arXiv:2512.08282·cs.CV·March 31, 2026

PAVAS: Physics-Aware Video-to-Audio Synthesis

Oh Hyun-Bin, Yuhta Takida, Toshimitsu Uesaka, Tae-Hyun Oh, Yuki Mitsufuji

PDF

1 Repo

TL;DR

PAVAS introduces a physics-aware approach to video-to-audio synthesis by integrating physical reasoning and physical parameters into a diffusion-based model, enhancing realism and physical consistency.

Contribution

The paper presents PAVAS, a novel V2A model that incorporates physical parameters estimated via vision-language models and 3D reconstruction, improving physical plausibility in generated sounds.

Findings

01

PAVAS outperforms existing models in physical realism and perceptual quality.

02

The new benchmark VGG-Impact evaluates physical realism in V2A.

03

The Audio-Physics Correlation Coefficient (APCC) measures physical-auditory consistency.

Abstract

Recent advances in Video-to-Audio (V2A) generation have achieved impressive perceptual quality and temporal synchronization, yet most models remain appearance-driven, capturing visual-acoustic correlations without considering the physical factors that shape real-world sounds. We present Physics-Aware Video-to-Audio Synthesis (PAVAS), a method that incorporates physical reasoning into a latent diffusion-based V2A generation through the Physics-Driven Audio Adapter (Phy-Adapter). The adapter receives object-level physical parameters estimated by the Physical Parameter Estimator (PPE), which uses a Vision-Language Model (VLM) to infer the moving-object mass and a segmentation-based dynamic 3D reconstruction module to recover its motion trajectory for velocity computation. These physical cues enable the model to synthesize sounds that reflect underlying physical factors. To assess physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://physics-aware-video-to-audio-synthesis.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.