ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Wendi Chen; Han Xue; Yi Wang; Fangyuan Zhou; Jun Lv; Yang Jin; Shirun Tang; Chuan Wen; Cewu Lu

arXiv:2512.10946·cs.RO·December 12, 2025

ImplicitRDP: An End-to-End Visual-Force Diffusion Policy with Structural Slow-Fast Learning

Wendi Chen, Han Xue, Yi Wang, Fangyuan Zhou, Jun Lv, Yang Jin, Shirun Tang, Chuan Wen, Cewu Lu

PDF

Open Access 1 Models 1 Datasets

TL;DR

ImplicitRDP introduces an end-to-end visual-force diffusion policy that effectively integrates asynchronous visual and force data using structural slow-fast learning, enabling improved contact-rich manipulation with real-time adjustments.

Contribution

The paper presents a novel unified policy architecture with causal attention and a regularization technique to handle asynchronous modalities in contact-rich tasks.

Findings

01

Outperforms vision-only and hierarchical baselines

02

Achieves higher success rates in contact-rich manipulation

03

Demonstrates improved reactivity and temporal coherence

Abstract

Human-level contact-rich manipulation relies on the distinct roles of two key modalities: vision provides spatially rich but temporally slow global context, while force sensing captures rapid, high-frequency local contact dynamics. Integrating these signals is challenging due to their fundamental frequency and informational disparities. In this work, we propose ImplicitRDP, a unified end-to-end visual-force diffusion policy that integrates visual planning and reactive force control within a single network. We introduce Structural Slow-Fast Learning, a mechanism utilizing causal attention to simultaneously process asynchronous visual and force tokens, allowing the policy to perform closed-loop adjustments at the force frequency while maintaining the temporal coherence of action chunks. Furthermore, to mitigate modality collapse where end-to-end models fail to adjust the weights across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
WendiChen/ImplicitRDP_model
model

Datasets

WendiChen/ImplicitRDP_dataset
dataset· 20 dl
20 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Motor Control and Adaptation