DiSPo: Diffusion-SSM based Policy Learning for Coarse-to-Fine Action Discretization
Nayoung Oh, Jaehyeong Jang, Moonkyeong Jung, Daehyung Park

TL;DR
DiSPo introduces a diffusion-based policy model that learns from coarse skills to generate fine-grained actions efficiently, outperforming baselines in coarse-to-fine tasks and demonstrating scalability in real-world scenarios.
Contribution
A novel diffusion-SSM policy (DiSPo) that enables coarse-to-fine skill learning from demonstrations with improved efficiency and scalability.
Findings
DiSPo achieves up to 81% higher success rate in benchmarks.
It improves inference efficiency by focusing on critical regions.
Demonstrates scalability in simulation and real-world tasks.
Abstract
We aim to solve the problem of generating coarse-to-fine skills learning from demonstrations (LfD). To scale precision, traditional LfD approaches often rely on extensive fine-grained demonstrations with external interpolations or dynamics models with limited generalization capabilities. For memory-efficient learning and convenient granularity change, we propose a novel diffusion-state space model (SSM) based policy (DiSPo) that learns from diverse coarse skills and produces varying control scales of actions by leveraging an SSM, Mamba. Our evaluations show the adoption of Mamba and the proposed step-scaling method enable DiSPo to outperform in three coarse-to-fine benchmark tests with maximum 81% higher success rate than baselines. In addition, DiSPo improves inference efficiency by generating coarse motions in less critical regions. We finally demonstrate the scalability of actions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Context-Aware Activity Recognition Systems
