VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning

Bo Pang; Chenxi Xu; Jierui Ren; Guoping Wang; Sheng Li

arXiv:2511.20422·cs.AI·November 26, 2025

VibraVerse: A Large-Scale Geometry-Acoustics Alignment Dataset for Physically-Consistent Multimodal Learning

Bo Pang, Chenxi Xu, Jierui Ren, Guoping Wang, Sheng Li

PDF

Open Access 1 Datasets

TL;DR

VibraVerse is a large-scale dataset linking 3D geometry, physical properties, and acoustic signals, enabling physically consistent multimodal learning and causal understanding of object sounds.

Contribution

The paper introduces VibraVerse, a comprehensive dataset and CLASP framework for cross-modal alignment that enforces physical and causal consistency in multimodal learning.

Findings

01

Models trained on VibraVerse outperform others in accuracy.

02

VibraVerse enables better interpretability of multimodal models.

03

The dataset supports generalization across different modalities.

Abstract

Understanding the physical world requires perceptual models grounded in physical laws rather than mere statistical correlations. However, existing multimodal learning frameworks, focused on vision and language, lack physical consistency and overlook the intrinsic causal relationships among an object's geometry, material, vibration modes, and the sounds it produces. We introduce VibraVerse, a large-scale geometry-acoustics alignment dataset that explicitly bridges the causal chain from 3D geometry -> physical attributes -> modal parameters -> acoustic signals. Each 3D model has explicit physical properties (density, Young's modulus, Poisson's ratio) and volumetric geometry, from which modal eigenfrequencies and eigenvectors are computed for impact sound synthesis under controlled excitations. To establish this coherence, we introduce CLASP, a contrastive learning framework for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

technetium66/VibraVerse
dataset· 73 dl
73 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis