Speech World Model: Causal State-Action Planning with Explicit Reasoning for Speech
Xuanru Zhou, Jiachen Lian, Henry Hong, Xinyi Yang, Gopala Anumanchipalli

TL;DR
This paper introduces a novel modular speech understanding model that explicitly reasons over speech states and actions using a causal graph, enhancing interpretability and reasoning capabilities beyond traditional black-box approaches.
Contribution
It proposes the first graph-based modular speech model that incorporates explicit causal reasoning and state-action planning, inspired by cognitive science principles.
Findings
First graph-based modular speech model for explicit reasoning
Enables counterfactual interventions and interpretability
Open-sourced model and data to foster further research
Abstract
Current speech-language models (SLMs) typically use a cascade of speech encoder and large language model, treating speech understanding as a single black box. They analyze the content of speech well but reason weakly about other aspects, especially under sparse supervision. Thus, we argue for explicit reasoning over speech states and actions with modular and transparent decisions. Inspired by cognitive science we adopt a modular perspective and a world model view in which the system learns forward dynamics over latent states. We factorize speech understanding into four modules that communicate through a causal graph, establishing a cognitive state search space. Guided by posterior traces from this space, an instruction-tuned language model produces a concise causal analysis and a user-facing response, enabling counterfactual interventions and interpretability under partial supervision.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multimodal Machine Learning Applications · AI-based Problem Solving and Planning
