Embodied Concept Learner: Self-supervised Learning of Concepts and Mapping through Instruction Following
Mingyu Ding, Yan Xu, Zhenfang Chen, David Daniel Cox, Ping Luo, Joshua, B. Tenenbaum, Chuang Gan

TL;DR
The Embodied Concept Learner (ECL) enables robots to learn visual concepts and depth perception through interaction and language instructions without supervision, improving instruction following and task reasoning in 3D environments.
Contribution
ECL introduces a modular, self-supervised framework for grounding concepts, constructing semantic maps, and executing tasks from language, with enhanced interpretability and reusability.
Findings
Outperforms previous methods on the ALFRED benchmark.
Learns semantics and depth unsupervisedly through interaction.
Provides transparent, step-by-step planning process.
Abstract
Humans, even at a very early age, can learn visual concepts and understand geometry and layout through active interaction with the environment, and generalize their compositions to complete tasks described by natural languages in novel scenes. To mimic such capability, we propose Embodied Concept Learner (ECL) in an interactive 3D environment. Specifically, a robot agent can ground visual concepts, build semantic maps and plan actions to complete tasks by learning purely from human demonstrations and language instructions, without access to ground-truth semantic and depth supervisions from simulations. ECL consists of: (i) an instruction parser that translates the natural languages into executable programs; (ii) an embodied concept learner that grounds visual concepts based on language descriptions; (iii) a map constructor that estimates depth and constructs semantic maps by leveraging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Reinforcement Learning in Robotics
