Human-assisted Robotic Policy Refinement via Action Preference Optimization

Wenke Xia; Yichu Yang; Hongtao Wu; Xiao Ma; Tao Kong; Di Hu

arXiv:2506.07127·cs.RO·October 31, 2025

Human-assisted Robotic Policy Refinement via Action Preference Optimization

Wenke Xia, Yichu Yang, Hongtao Wu, Xiao Ma, Tao Kong, Di Hu

PDF

TL;DR

This paper introduces Action Preference Optimization (APO), a human-assisted method to refine Vision-Language-Action models for robotics, enabling better failure correction and robustness in real-world manipulation tasks.

Contribution

The paper presents APO, a novel adaptive reweighting algorithm that improves VLA models by incorporating human interaction data for post-deployment refinement.

Findings

01

APO improves model robustness in simulation and real-world tasks.

02

The method effectively suppresses failure-prone actions.

03

Enhanced generalization in dynamic environments.

Abstract

Establishing a reliable and iteratively refined robotic system is essential for deploying real-world applications. While Vision-Language-Action (VLA) models are widely recognized as the foundation model for such robotic deployment, their reliance on offline expert demonstrations critically limits their capacity for post-deployment refinement. To mitigate this limitation, we introduce Action Preference Optimization (APO), a method designed to refine VLA models by human-assisted preference alignment gathered through interaction with environments. This method begins with a human-robot collaboration framework for reliable failure correction and interaction trajectory collection through human intervention. However, directly leveraging these interaction trajectories for preference optimization is non-trivial due to the challenges of irreversible robotic actions and token distribution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.