Learning to Explore Informative Trajectories and Samples for Embodied Perception
Ya Jing, Tao Kong

TL;DR
This paper introduces a self-supervised exploration policy for embodied perception tasks that efficiently gathers informative samples by building a 3D semantic map, leading to improved perception models and successful real-robot deployment.
Contribution
We propose a novel exploration policy based on semantic distribution maps and uncertainty rewards, enhancing data collection for embodied perception models.
Findings
Our method outperforms baseline exploration policies in perception accuracy.
The approach improves robustness in real-robot experiments.
Semantic distribution-based exploration reduces unnecessary observations.
Abstract
We are witnessing significant progress on perception models, specifically those trained on large-scale internet images. However, efficiently generalizing these perception models to unseen embodied tasks is insufficiently studied, which will help various relevant applications (e.g., home robots). Unlike static perception methods trained on pre-collected images, the embodied agent can move around in the environment and obtain images of objects from any viewpoints. Therefore, efficiently learning the exploration policy and collection method to gather informative training samples is the key to this task. To do this, we first build a 3D semantic distribution map to train the exploration policy self-supervised by introducing the semantic distribution disagreement and the semantic distribution uncertainty rewards. Note that the map is generated from multi-view observations and can weaken the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
