Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye; Yunsheng Zhang; Haoyang Weng; Xianfan Gu; Shengjie Wang; Tong Zhang; Mengchen Wang; Pieter Abbeel; Yang Gao

arXiv:2310.02635·cs.RO·April 24, 2026

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

Weirui Ye, Yunsheng Zhang, Haoyang Weng, Xianfan Gu, Shengjie Wang, Tong Zhang, Mengchen Wang, Pieter Abbeel, Yang Gao

PDF

1 Repo

TL;DR

This paper introduces RLFP, a framework that leverages foundation models to enable embodied agents to learn manipulation tasks more efficiently with minimal reward engineering, achieving high success rates in real and simulated environments.

Contribution

The paper proposes RLFP and FAC, novel algorithms that utilize foundation models for guidance, resulting in sample-efficient learning and robust performance with minimal reward engineering.

Findings

01

FAC achieves 86% success rate on real robots after one hour of training.

02

FAC outperforms baseline methods in simulation with fewer frames.

03

The framework is robust to noisy priors and agnostic to foundation model types.

Abstract

Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient};…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://yewr.github.io/rlfp
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.