Rewarding DINO: Predicting Dense Rewards with Vision Foundation Models
Pierre Krack, Tobias J\"ulg, Wolfram Burgard, Florian Walter

TL;DR
Rewarding DINO introduces a vision-based, language-conditioned reward model that learns task semantics from visual data, enabling generalization and real-world application in robot manipulation without relying on privileged information.
Contribution
It presents Rewarding DINO, a compact, reward-predicting model trained on diverse tasks that generalizes well and replaces analytical rewards in reinforcement learning.
Findings
Achieves competitive accuracy and correlation in reward prediction.
Generalizes to new simulation and real-world tasks.
Effectively supports reinforcement learning for task completion.
Abstract
Well-designed dense reward functions in robot manipulation not only indicate whether a task is completed but also encode progress along the way. Generally, designing dense rewards is challenging and usually requires access to privileged state information available only in simulation, not in real-world experiments. This makes reward prediction models that infer task state information from camera images attractive. A common approach is to predict rewards from expert demonstrations based on visual similarity or sequential frame ordering. However, this biases the resulting reward function towards a specific solution and leaves it undefined in states not covered by the demonstrations. In this work, we introduce Rewarding DINO, a method for language-conditioned reward modeling that learns actual reward functions rather than specific trajectories. The model's compact size allows it to serve as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
