VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for   Long-horizon Manipulation

Kuo-Han Hung; Pang-Chi Lo; Jia-Fong Yeh; Han-Yuan Hsu; Yi-Ting Chen,; Winston H. Hsu

arXiv:2405.16545·cs.RO·February 21, 2025

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen,, Winston H. Hsu

PDF

Open Access 1 Video

TL;DR

VICtoR is a hierarchical reward model that improves long-horizon manipulation by accurately assessing task progress through stages and motion evaluation, trained on videos and instructions.

Contribution

It introduces VICtoR, a hierarchical reward model with stage detection and motion evaluation for better long-horizon task learning.

Findings

01

Outperforms existing VIC methods by 43% in success rates.

02

Effective in both simulated and real-world environments.

03

Provides precise, multi-level reward signals for complex tasks.

Abstract

We study reward models for long-horizon manipulation tasks by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Recent advancements in cross-modality modeling have highlighted the potential of reward modeling through visual and language correlations. However, existing VIC methods face challenges in learning rewards for long-horizon tasks due to their lack of sub-stage awareness, difficulty in modeling task complexities, and inadequate object state estimation. To address these challenges, we introduce VICtoR, a novel hierarchical VIC reward model capable of providing effective reward signals for long-horizon manipulation tasks. VICtoR precisely assesses task progress at various levels through a novel stage detector and motion progress evaluator, offering insightful guidance for agents learning the task effectively.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Domain Adaptation and Few-Shot Learning