Task-Agnostic Pre-training and Task-Guided Fine-tuning for Versatile Diffusion Planner
Chenyou Fan, Chenjia Bai, Zhao Shan, Haoran He, Yang Zhang, Zhen Wang

TL;DR
This paper introduces SODP, a two-stage diffusion planning framework that pre-trains on large-scale, sub-optimal multi-task data and fine-tunes with task-specific rewards, enabling versatile and efficient task adaptation.
Contribution
It presents a novel two-stage approach combining pre-training on sub-optimal data and RL-based fine-tuning for multi-task diffusion planning.
Findings
Outperforms state-of-the-art methods in multi-task domains.
Requires less data for reward-guided fine-tuning.
Effectively leverages sub-optimal trajectories for generalizable planning.
Abstract
Diffusion models have demonstrated their capabilities in modeling trajectories of multi-tasks. However, existing multi-task planners or policies typically rely on task-specific demonstrations via multi-task imitation, or require task-specific reward labels to facilitate policy optimization via Reinforcement Learning (RL). They are costly due to the substantial human efforts required to collect expert data or design reward functions. To address these challenges, we aim to develop a versatile diffusion planner capable of leveraging large-scale inferior data that contains task-agnostic sub-optimal trajectories, with the ability to fast adapt to specific tasks. In this paper, we propose SODP, a two-stage framework that leverages Sub-Optimal data to learn a Diffusion Planner, which is generalizable for various downstream tasks. Specifically, in the pre-training stage, we train a foundation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsManufacturing Process and Optimization
MethodsDiffusion
