Amortized Object and Scene Perception for Long-term Robot Manipulation
Ferenc Balint-Benczedi, Michael Beetz

TL;DR
This paper presents a perception system for long-term robot manipulation that maintains a dynamic world model by asynchronously integrating perception results, enabling robots to track and query objects and scenes over time.
Contribution
It introduces an amortized perception component that distributes perception tasks across execution, combining symbolic and sub-symbolic representations for persistent scene understanding.
Findings
Enables robots to maintain a consistent world model over time.
Supports querying past and current scenes for manipulation tasks.
Improves perception efficiency through amortized processing.
Abstract
Mobile robots, performing long-term manipulation activities in human environments, have to perceive a wide variety of objects possessing very different visual characteristics and need to reliably keep track of these throughout the execution of a task. In order to be efficient, robot perception capabilities need to go beyond what is currently perceivable and should be able to answer queries about both current and past scenes. In this paper we investigate a perception system for long-term robot manipulation that keeps track of the changing environment and builds a representation of the perceived world. Specifically we introduce an amortized component that spreads perception tasks throughout the execution cycle. The resulting query driven perception system asynchronously integrates results from logged images into a symbolic and numeric (what we call sub-symbolic) representation that forms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
