Text-Aware Diffusion for Policy Learning
Calvin Luo, Mandy He, Zilai Zeng, Chen Sun

TL;DR
TADPoLe leverages a pretrained text-conditioned diffusion model to provide zero-shot reward signals, enabling natural language-guided policy learning in robotics and locomotion without expert demonstrations.
Contribution
The paper introduces TADPoLe, a novel method that uses a frozen diffusion model to facilitate zero-shot, text-aligned policy learning in complex environments.
Findings
Successfully learned goal achievement and locomotion behaviors from natural language.
Policies are more natural and human-like according to evaluations.
Performs competitively on robotic manipulation tasks without in-domain demonstrations.
Abstract
Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning. We hypothesize that large-scale pretrained generative models encode rich priors that can supervise a policy to behave not only in a text-aligned manner, but also in alignment with a notion of naturalness summarized from internet-scale training data. In our experiments, we demonstrate that TADPoLe is able to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAccess Control and Trust
MethodsDiffusion
