Understanding Multimodal Failure in Action-Chunking Behavioral Cloning
Lorenzo Mazza, Massimiliano Datres, Ariel Rodriguez, Sebastian Bodenstedt, Gitta Kutyniok, Stefanie Speidel

TL;DR
This paper investigates the challenges of multimodal action distributions in behavioral cloning, analyzing different policy parameterizations and their failure modes through experiments on synthetic and robotic tasks.
Contribution
It provides a detailed analysis of how various multimodal policy parameterizations fail and proposes insights into their limitations and behaviors.
Findings
Posterior-prior regularization improves sampling but can remove mode-distinguishing information.
Reducing regularization preserves mode information but depends on prior coverage.
Multimodality in action-space generative policies is limited by the Lipschitz constant of the transport map.
Abstract
Behavioral cloning becomes difficult when the same observation admits several valid actions. We study this problem for action-chunking policies and show that different multimodal parameterizations fail in different ways. For latent-variable policies, posterior-prior regularization makes deployment-time sampling more reliable, but excessive regularization removes the action-conditioned information needed to distinguish demonstrated modes. Reducing this regularization can preserve mode information, but then success depends on whether the prior covers the relevant latent regions. For action-space generative policies, multimodality is constrained by the smoothness of the base-to-action transport: a map with small Lipschitz constant cannot assign substantial probability to many well-separated modes. Covering many modes therefore requires either sharp transitions in base space or off-support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
