Reward-Zero: Language Embedding Driven Implicit Reward Mechanisms for Reinforcement Learning
Heng Zhang, Haddy Alchaer, Arash Ajoudani, Yu She

TL;DR
Reward-Zero introduces a language embedding-based implicit reward mechanism that improves reinforcement learning efficiency and generalization by providing semantic progress signals from natural language task descriptions.
Contribution
It presents a novel universal reward function leveraging language embeddings to enhance RL training without task-specific engineering.
Findings
Agents with Reward-Zero converge faster and achieve higher success rates.
It stabilizes training and improves exploration in RL tasks.
Successfully solves complex tasks where traditional rewards fail.
Abstract
We introduce Reward-Zero, a general-purpose implicit reward mechanism that transforms natural-language task descriptions into dense, semantically grounded progress signals for reinforcement learning (RL). Reward-Zero serves as a simple yet sophisticated universal reward function that leverages language embeddings for efficient RL training. By comparing the embedding of a task specification with embeddings derived from an agent's interaction experience, Reward-Zero produces a continuous, semantically aligned sense-of-completion signal. This reward supplements sparse or delayed environmental feedback without requiring task-specific engineering. When integrated into standard RL frameworks, it accelerates exploration, stabilizes training, and enhances generalization across diverse tasks. Empirically, agents trained with Reward-Zero converge faster and achieve higher final success rates than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Action Observation and Synchronization
