TL;DR
AirObject introduces a novel temporal 3D object encoding method that builds evolving global representations from multiple viewpoints, enabling robust, class-agnostic object identification in robotic applications.
Contribution
It presents the first temporal object encoding approach using a graph attention-based method combined with a temporal convolutional network for robust, evolving object representations.
Findings
Achieves state-of-the-art video object identification performance.
Robust to occlusion, viewpoint shifts, and scale changes.
Outperforms existing single-frame and sequential descriptors.
Abstract
Object encoding and identification are vital for robotic tasks such as autonomous exploration, semantic scene understanding, and re-localization. Previous approaches have attempted to either track objects or generate descriptors for object identification. However, such systems are limited to a "fixed" partial object representation from a single viewpoint. In a robot exploration setup, there is a requirement for a temporally "evolving" global object representation built as the robot observes the object from multiple viewpoints. Furthermore, given the vast distribution of unknown novel objects in the real world, the object identification process must be class-agnostic. In this context, we propose a novel temporal 3D object encoding approach, dubbed AirObject, to obtain global keypoint graph-based embeddings of objects. Specifically, the global 3D object embeddings are generated using a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
