RFTF: Reinforcement Fine-tuning for Embodied Agents with Temporal Feedback
Junyang Shu, Zhiwei Lin, Yongtao Wang

TL;DR
This paper introduces RFTF, a reinforcement fine-tuning method using a value model with temporal feedback to improve embodied agents' performance and adaptability in complex tasks.
Contribution
RFTF leverages a value model trained with temporal information to generate dense rewards, enhancing fine-tuning effectiveness without costly action labels.
Findings
Achieved state-of-the-art success rates on CALVIN ABC-D.
Enabled rapid adaptation to new environments.
Improved generalization and manipulation capabilities.
Abstract
Vision-Language-Action (VLA) models have demonstrated significant potential in the field of embodied intelligence, enabling agents to follow human instructions to complete complex tasks in physical environments. Existing embodied agents are often trained through behavior cloning, which requires expensive data and computational resources and is constrained by human demonstrations. To address this issue, many researchers explore the application of reinforcement fine-tuning to embodied agents. However, typical reinforcement fine-tuning methods for embodied agents usually rely on sparse, outcome-based rewards, which struggle to provide fine-grained feedback for specific actions within an episode, thus limiting the model's manipulation capabilities and generalization performance. In this paper, we propose RFTF, a novel reinforcement fine-tuning method that leverages a value model to generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Multi-Agent Systems and Negotiation
