Information Filtering via Variational Regularization for Robot Manipulation

Jinhao Zhang; Wenlong Xia; Yaojia Wang; Zhexuan Zhou; Huizhe Li; Yichen Lai; Haoming Song; Youmin Gong; Jie Mei

arXiv:2601.21926·cs.RO·May 12, 2026

Information Filtering via Variational Regularization for Robot Manipulation

Jinhao Zhang, Wenlong Xia, Yaojia Wang, Zhexuan Zhou, Huizhe Li, Yichen Lai, Haoming Song, Youmin Gong, Jie Mei

PDF

TL;DR

This paper introduces Variational Regularization, a novel module that enhances diffusion-based visuomotor policies by reducing task-irrelevant noise in features, leading to improved robotic manipulation performance.

Contribution

It proposes a plug-and-play variational regularization method that imposes an adaptive information bottleneck on features, improving state-of-the-art results in robotic manipulation tasks.

Findings

01

Consistently improves task success rates on RoboTwin2.0, Adroit, and MetaWorld benchmarks.

02

Achieves new state-of-the-art results in simulated robotic manipulation.

03

Demonstrates effective real-world deployment performance.

Abstract

Diffusion-based visuomotor policies built on 3D visual representations have achieved strong performance in learning complex robotic skills. However, most existing methods employ an oversized denoising decoder. While increasing model capacity can improve denoising, empirical evidence suggests that it also introduces redundancy and noise in intermediate feature blocks. Crucially, we find that randomly masking backbone features in U-Net or skipping intermediate layers in DiT at inference time (without changing training) can improve performance, confirming the presence of task-irrelevant noise in intermediate features. To this end, we propose Variational Regularization (VR), a plug-and-play module that imposes a context-conditioned Gaussian over the noisy features and applies a KL-divergence regularizer, forming an adaptive information bottleneck. Extensive experiments on three simulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.