Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied   Navigation

Gyan Tatiya; Jonathan Francis; Luca Bondi; Ingrid Navarro; Eric; Nyberg; Jivko Sinapov; Jean Oh

arXiv:2212.11345·cs.RO·December 23, 2022·1 cites

Knowledge-driven Scene Priors for Semantic Audio-Visual Embodied Navigation

Gyan Tatiya, Jonathan Francis, Luca Bondi, Ingrid Navarro, Eric, Nyberg, Jivko Sinapov, Jean Oh

PDF

Open Access

TL;DR

This paper introduces knowledge-driven scene priors and a new evaluation sub-task for semantic audio-visual embodied navigation, significantly improving agents' ability to generalize to unseen regions and novel sounding objects in simulation.

Contribution

It proposes a novel framework combining semantic, spatial, and background knowledge with reinforcement learning for improved generalization in SAVi tasks, including a new sub-task for novel sound objects.

Findings

01

Enhanced generalization to unseen regions

02

Improved handling of novel sounding objects

03

Superior performance over baselines in simulation

Abstract

Generalisation to unseen contexts remains a challenge for embodied navigation agents. In the context of semantic audio-visual navigation (SAVi) tasks, the notion of generalisation should include both generalising to unseen indoor visual scenes as well as generalising to unheard sounding objects. However, previous SAVi task definitions do not include evaluation conditions on truly novel sounding objects, resorting instead to evaluating agents on unheard sound clips of known objects; meanwhile, previous SAVi methods do not include explicit mechanisms for incorporating domain knowledge about object and region semantics. These weaknesses limit the development and assessment of models' abilities to generalise their learned experience. In this work, we introduce the use of knowledge-driven scene priors in the semantic audio-visual embodied navigation task: we combine semantic information from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies