Modeling Dynamic Environments with Scene Graph Memory
Andrey Kurenkov, Michael Lingelbach, Tanmay Agarwal, Emily Jin,, Chengshu Li, Ruohan Zhang, Li Fei-Fei, Jiajun Wu, Silvio Savarese, Roberto, Mart\'in-Mart\'in

TL;DR
This paper introduces Scene Graph Memory and a neural network architecture to predict object locations in dynamic, partially observable environments, improving search efficiency for embodied AI agents.
Contribution
It presents a novel scene graph memory representation and a Node Edge Predictor model for link prediction in dynamic, partially observable graphs, addressing a key challenge in embodied AI.
Findings
NEP outperforms baselines in diverse environments
SGM effectively captures accumulated observations
Method adapts well to various object movement dynamics
Abstract
Embodied AI agents that search for objects in large environments such as households often need to make efficient decisions by predicting object locations based on partial information. We pose this as a new type of link prediction problem: link prediction on partially observable dynamic graphs. Our graph is a representation of a scene in which rooms and objects are nodes, and their relationships are encoded in the edges; only parts of the changing graph are known to the agent at each timestep. This partial observability poses a challenge to existing link prediction approaches, which we address. We propose a novel state representation -- Scene Graph Memory (SGM) -- with captures the agent's accumulated set of observations, as well as a neural net architecture called a Node Edge Predictor (NEP) that extracts information from the SGM to search efficiently. We evaluate our method in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Mobility and Location-Based Analysis · Human Pose and Action Recognition
