KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV
Zhihao Chen, Yiyuan Ge, Ziyang Wang

TL;DR
This paper introduces KAN-We-Flow, a lightweight, efficient flow-matching policy for 3D robotic manipulation that leverages novel RWKV-KAN blocks and an auxiliary loss, achieving state-of-the-art results without large neural networks.
Contribution
The work presents a new lightweight backbone combining RWKV and KAN architectures, along with an auxiliary loss, to improve efficiency and performance in robotic manipulation tasks.
Findings
Reduces model parameters by 86.8% compared to UNet-based models.
Achieves state-of-the-art success rates on multiple benchmarks.
Maintains fast inference speed with a lightweight design.
Abstract
Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet backbones, which hinders deployment on resource-constrained robots. Flow matching alleviates the sampling burden by learning a one-step vector field, yet prior implementations still inherit large UNet-style architectures. In this work, we present KAN-We-Flow, a flow-matching policy that draws on recent advances in Receptance Weighted Key Value (RWKV) and Kolmogorov-Arnold Networks (KAN) from vision to build a lightweight and highly expressive backbone for 3D manipulation. Concretely, we introduce an RWKV-KAN block: an RWKV first performs efficient time/channel mixing to propagate task context, and a subsequent GroupKAN layer applies learnable spline-based, groupwise functional mappings to perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Generative Adversarial Networks and Image Synthesis
