Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation
Gyan Tatiya, Jonathan Francis, Luca Bondi, Ingrid Navarro, Eric, Nyberg, Jivko Sinapov, Jean Oh

TL;DR
This paper introduces knowledge-driven scene priors and a new evaluation sub-task for semantic audio-visual embodied navigation, significantly improving agents' ability to generalize to unseen regions and novel sounding objects in simulation.
Contribution
It proposes a novel framework combining semantic, spatial, and background knowledge with reinforcement learning for improved generalization in SAVi tasks, including a new sub-task for novel sound objects.
Findings
Enhanced generalization to unseen regions
Improved handling of novel sounding objects
Superior performance over baselines in simulation
Abstract
Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
