Learning Object-Based State Estimators for Household Robots
Yilun Du, Tomas Lozano-Perez, Leslie Kaelbling

TL;DR
This paper introduces a neural network-based object memory system for household robots that learns to track and recall objects over time, even as they move, improving long-term object retrieval in dynamic environments.
Contribution
It combines classic data-association filtering with attention-based neural networks to create an end-to-end trainable object memory system for dynamic household environments.
Findings
Effective in maintaining object memory in simulated environments.
Demonstrates improved performance over classical and unstructured neural methods.
Works with real images, showing practical applicability.
Abstract
A robot operating in a household makes observations of multiple objects as it moves around over the course of days or weeks. The objects may be moved by inhabitants, but not completely at random. The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them. Existing work in semantic slam does not attempt to capture the dynamics of object movement. In this paper, we combine some aspects of classic techniques for data-association filtering with modern attention-based neural networks to construct object-based memory systems that operate on high-dimensional observations and hypotheses. We perform end-to-end learning on labeled observation trajectories to learn both the transition and observation models. We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition
