HiVAE: Hierarchical Latent Variables for Scalable Theory of Mind
Nigel Doering, Rahath Malladi, Arshia Sangwan, David Danks, Tauhidur Rahman

TL;DR
HiVAE introduces a hierarchical variational autoencoder architecture that scales theory of mind reasoning to complex, realistic environments, achieving significant performance improvements but facing challenges in grounding latent representations to actual mental states.
Contribution
The paper presents HiVAE, a novel hierarchical VAE model inspired by human cognition, enabling scalable theory of mind reasoning in complex domains.
Findings
Achieves performance improvements on a large campus navigation task.
Hierarchical structure enhances prediction accuracy.
Latent representations lack explicit grounding to mental states.
Abstract
Theory of mind (ToM) enables AI systems to infer agents' hidden goals and mental states, but existing approaches focus mainly on small human understandable gridworld spaces. We introduce HiVAE, a hierarchical variational architecture that scales ToM reasoning to realistic spatiotemporal domains. Inspired by the belief-desire-intention structure of human cognition, our three-level VAE hierarchy achieves substantial performance improvements on a 3,185-node campus navigation task. However, we identify a critical limitation: while our hierarchical structure improves prediction, learned latent representations lack explicit grounding to actual mental states. We propose self-supervised alignment strategies and present this work to solicit community feedback on grounding approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsChild and Animal Learning Development · Embodied and Extended Cognition · Multimodal Machine Learning Applications
