RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback

Amitesh Vatsa; Zhixian Xie; Wanxin Jin

arXiv:2602.00886·cs.RO·February 3, 2026

RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback

Amitesh Vatsa, Zhixian Xie, Wanxin Jin

PDF

Open Access

TL;DR

RoDiF introduces a robust method for fine-tuning diffusion policies in robotics using corrupted human feedback, combining a unified MDP formulation with a conservative optimization strategy to improve preference alignment and robustness.

Contribution

It presents RoDiF, a novel approach that explicitly handles corrupted human preferences during diffusion policy fine-tuning through a geometric hypothesis-cutting perspective.

Findings

01

Outperforms state-of-the-art baselines in manipulation tasks

02

Maintains performance with up to 30% corrupted preferences

03

Effectively steers policies to human-preferred modes

Abstract

Diffusion policies are a powerful paradigm for robotic control, but fine-tuning them with human preferences is fundamentally challenged by the multi-step structure of the denoising process. To overcome this, we introduce a Unified Markov Decision Process (MDP) formulation that coherently integrates the diffusion denoising chain with environmental dynamics, enabling reward-free Direct Preference Optimization (DPO) for diffusion policies. Building on this formulation, we propose RoDiF (Robust Direct Fine-Tuning), a method that explicitly addresses corrupted human preferences. RoDiF reinterprets the DPO objective through a geometric hypothesis-cutting perspective and employs a conservative cutting strategy to achieve robustness without assuming any specific noise distribution. Extensive experiments on long-horizon manipulation tasks show that RoDiF consistently outperforms state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Adversarial Robustness in Machine Learning