ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Letian Chen; Nina Moorman; Matthew Gombolay

arXiv:2411.18825·cs.RO·May 13, 2025

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics

Letian Chen, Nina Moorman, Matthew Gombolay

PDF

Open Access 1 Video

TL;DR

ELEMENTAL integrates visual demonstrations with language guidance and iterative self-reflection to improve reward design and generalization in robotic reinforcement learning, surpassing prior methods.

Contribution

It introduces a novel framework combining visual demonstrations and language for reward learning, addressing LLM limitations in robotics tasks.

Findings

01

Outperforms prior methods by 42.3% in task success

02

Achieves 41.3% better generalization in out-of-distribution tasks

03

Demonstrates robustness in learning from demonstrations and language

Abstract

Reinforcement learning (RL) has demonstrated compelling performance in robotic tasks, but its success often hinges on the design of complex, ad hoc reward functions. Researchers have explored how Large Language Models (LLMs) could enable non-expert users to specify reward functions more easily. However, LLMs struggle to balance the importance of different features, generalize poorly to out-of-distribution robotic tasks, and cannot represent the problem properly with only text-based descriptions. To address these challenges, we propose ELEMENTAL (intEractive LEarning froM dEmoNstraTion And Language), a novel framework that combines natural language guidance with visual user demonstrations to align robot behavior with user intentions better. By incorporating visual inputs, ELEMENTAL overcomes the limitations of text-only task specifications, while leveraging inverse reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ELEMENTAL: Interactive Learning from Demonstrations and Vision-Language Models for Reward Design in Robotics· slideslive

Taxonomy

TopicsManufacturing Process and Optimization · Robot Manipulation and Learning

MethodsALIGN · High-Order Consensuses