Human-assisted Robotic Policy Refinement via Action Preference Optimization
Wenke Xia, Yichu Yang, Hongtao Wu, Xiao Ma, Tao Kong, Di Hu

TL;DR
This paper introduces Action Preference Optimization (APO), a human-assisted method to refine Vision-Language-Action models for robotics, enabling better failure correction and robustness in real-world manipulation tasks.
Contribution
The paper presents APO, a novel adaptive reweighting algorithm that improves VLA models by incorporating human interaction data for post-deployment refinement.
Findings
APO improves model robustness in simulation and real-world tasks.
The method effectively suppresses failure-prone actions.
Enhanced generalization in dynamic environments.
Abstract
Establishing a reliable and iteratively refined robotic system is essential for deploying real-world applications. While Vision-Language-Action (VLA) models are widely recognized as the foundation model for such robotic deployment, their reliance on offline expert demonstrations critically limits their capacity for post-deployment refinement. To mitigate this limitation, we introduce Action Preference Optimization (APO), a method designed to refine VLA models by human-assisted preference alignment gathered through interaction with environments. This method begins with a human-robot collaboration framework for reliable failure correction and interaction trajectory collection through human intervention. However, directly leveraging these interaction trajectories for preference optimization is non-trivial due to the challenges of irreversible robotic actions and token distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
