Text-Aware Diffusion for Policy Learning

Calvin Luo; Mandy He; Zilai Zeng; Chen Sun

arXiv:2407.01903·cs.LG·November 1, 2024

Text-Aware Diffusion for Policy Learning

Calvin Luo, Mandy He, Zilai Zeng, Chen Sun

PDF

Open Access 1 Video

TL;DR

TADPoLe leverages a pretrained text-conditioned diffusion model to provide zero-shot reward signals, enabling natural language-guided policy learning in robotics and locomotion without expert demonstrations.

Contribution

The paper introduces TADPoLe, a novel method that uses a frozen diffusion model to facilitate zero-shot, text-aligned policy learning in complex environments.

Findings

01

Successfully learned goal achievement and locomotion behaviors from natural language.

02

Policies are more natural and human-like according to evaluations.

03

Performs competitively on robotic manipulation tasks without in-domain demonstrations.

Abstract

Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning. We hypothesize that large-scale pretrained generative models encode rich priors that can supervise a policy to behave not only in a text-aligned manner, but also in alignment with a notion of naturalness summarized from internet-scale training data. In our experiments, we demonstrate that TADPoLe is able to learn…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Text-Aware Diffusion for Policy Learning· slideslive

Taxonomy

TopicsAccess Control and Trust

MethodsDiffusion