VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning
Bo Pang, Chenxi Xu, Jierui Ren, Guoping Wang, Sheng Li

TL;DR
VibraVerse is a large-scale dataset linking 3D geometry, physical properties, and acoustic signals, enabling physically consistent multimodal learning and causal understanding of object sounds.
Contribution
The paper introduces VibraVerse, a comprehensive dataset and CLASP framework for cross-modal alignment that enforces physical and causal consistency in multimodal learning.
Findings
Models trained on VibraVerse outperform others in accuracy.
VibraVerse enables better interpretability of multimodal models.
The dataset supports generalization across different modalities.
Abstract
Understanding the physical world requires perceptual models grounded in physical laws rather than mere statistical correlations. However, existing multimodal learning frameworks, focused on vision and language, lack physical consistency and overlook the intrinsic causal relationships among an object's geometry, material, vibration modes, and the sounds it produces. We introduce VibraVerse, a large-scale geometry-acoustics alignment dataset that explicitly bridges the causal chain from 3D geometry -> physical attributes -> modal parameters -> acoustic signals. Each 3D model has explicit physical properties (density, Young's modulus, Poisson's ratio) and volumetric geometry, from which modal eigenfrequencies and eigenvectors are computed for impact sound synthesis under controlled excitations. To establish this coherence, we introduce CLASP, a contrastive learning framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
