Teaching Physical Awareness to LLMs through Sounds
Weiguo Wang, Andy Nie, Wenrui Zhou, Yi Kai, Chengchen Hu

TL;DR
This paper introduces ACORN, a framework that teaches LLMs physical awareness through sound by using a physics-based simulator and an audio encoder, enabling understanding of phenomena like Doppler effect and spatial relationships.
Contribution
The paper presents ACORN, a novel framework combining a physics-based sound simulator and an audio encoder to enhance LLMs' physical understanding.
Findings
Reasonable results in simulated physical tasks
Effective training data generation with the simulator
Improved LLM understanding of physical phenomena
Abstract
Large Language Models (LLMs) have shown remarkable capabilities in text and multimodal processing, yet they fundamentally lack physical awareness--understanding of real-world physical phenomena. In this work, we present ACORN, a framework that teaches LLMs physical awareness through sound, focusing on fundamental physical phenomena like the Doppler effect, multipath effect, and spatial relationships. To overcome data scarcity, ACORN introduce a physics-based simulator combining real-world sound sources with controlled physical channels to generate diverse training data. Using this simulator, we build AQA-PHY, a comprehensive Audio Question-Answer dataset, and propose an audio encoder that processes both magnitude and phase information. By connecting our audio encoder to state-of-the-art LLMs, we demonstrate reasonable results in both simulated and real-world tasks, such as line-of-sight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech Recognition and Synthesis
