Contextual Graph Representations for Task-Driven 3D Perception and Planning

Christopher Agia

arXiv:2603.26685·cs.RO·March 31, 2026

Contextual Graph Representations for Task-Driven 3D Perception and Planning

Christopher Agia

PDF

TL;DR

This paper investigates the use of 3D scene graphs and graph neural networks to improve task planning efficiency in robot perception, proposing benchmarks and methods to handle complex relational data.

Contribution

It introduces a benchmark for comparing classical planners and explores graph neural networks to learn invariant representations for faster planning.

Findings

01

Constructed a benchmark for empirical comparison of classical planners.

02

Explored graph neural networks to learn invariant relational representations.

03

Assessed the suitability of existing environments for robot task planning with 3D scene graphs.

Abstract

Recent advances in computer vision facilitate fully automatic extraction of object-centric relational representations from visual-inertial data. These state representations, dubbed 3D scene graphs, are a hierarchical decomposition of real-world scenes with a dense multiplex graph structure. While 3D scene graphs claim to promote efficient task planning for robot systems, they contain numerous objects and relations when only small subsets are required for a given task. This magnifies the state space that task planners must operate over and prohibits deployment in resource constrained settings. This thesis tests the suitability of existing embodied AI environments for research at the intersection of robot task planning and 3D scene graphs and constructs a benchmark for empirical comparison of state-of-the-art classical planners. Furthermore, we explore the use of graph neural networks to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.