A Survey on World Models Grounded in Acoustic Physical Information
Xiaoliang Chen, Le Chang, Xin Yu, Yunhe Huang, Xianling Tu

TL;DR
This survey reviews the development of acoustic-based world models, emphasizing theoretical foundations, methodological frameworks, and applications in various fields, aiming to advance embodied acoustic intelligence.
Contribution
It provides a comprehensive overview of acoustic world models, integrating physical laws, neural networks, and multimodal learning, and outlines future research directions and challenges.
Findings
Acoustic signals encode rich physical information about environments.
Core methodologies include PINNs, generative models, and self-supervised learning.
Applications span robotics, autonomous driving, healthcare, and finance.
Abstract
This survey provides a comprehensive overview of the emerging field of world models grounded in the foundation of acoustic physical information. It examines the theoretical underpinnings, essential methodological frameworks, and recent technological advancements in leveraging acoustic signals for high-fidelity environmental perception, causal physical reasoning, and predictive simulation of dynamic events. The survey explains how acoustic signals, as direct carriers of mechanical wave energy from physical events, encode rich, latent information about material properties, internal geometric structures, and complex interaction dynamics. Specifically, this survey establishes the theoretical foundation by explaining how fundamental physical laws govern the encoding of physical information within acoustic signals. It then reviews the core methodological pillars, including Physics-Informed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
