A General Purpose Supervisory Signal for Embodied Agents
Kunal Pratap Singh, Jordi Salvador, Luca Weihs, Aniruddha Kembhavi

TL;DR
This paper introduces the Scene Graph Contrastive (SGC) loss, a novel self-supervised training method for embodied AI agents that enhances their environmental understanding by aligning representations with scene graphs, improving performance on navigation tasks.
Contribution
The paper presents the SGC loss, a general-purpose, contrastive learning approach that encodes semantic and relational information without explicit graph decoding, advancing embodied AI training.
Findings
SGC loss improves performance on Object Navigation tasks
Enhanced encoding of object semantics and relationships
Significant gains across multiple embodied navigation tasks
Abstract
Training effective embodied AI agents often involves manual reward engineering, expert imitation, specialized components such as maps, or leveraging additional sensors for depth and localization. Another approach is to use neural architectures alongside self-supervised objectives which encourage better representation learning. In practice, there are few guarantees that these self-supervised objectives encode task-relevant information. We propose the Scene Graph Contrastive (SGC) loss, which uses scene graphs as general-purpose, training-only, supervisory signals. The SGC loss does away with explicit graph decoding and instead uses contrastive learning to align an agent's representation with a rich graphical encoding of its environment. The SGC loss is generally applicable, simple to implement, and encourages representations that encode objects' semantics, relationships, and history.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsALIGN · Contrastive Learning
