Learning Probabilistic Multi-Modal Actor Models for Vision-Based Robotic Grasping
Mengyuan Yan, Adrian Li, Mrinal Kalakrishnan, Peter Pastor

TL;DR
This paper introduces a neural density model for vision-based robotic grasping that directly predicts successful grasp distributions, significantly reducing inference time while maintaining performance.
Contribution
It presents a novel neural density model combining Gaussian mixtures and normalizing flows for efficient, multi-modal grasp prediction, enabling faster inference in robotic grasping.
Findings
Achieves similar grasping performance to CEM-based methods.
Reduces inference time by 3 times compared to state-of-the-art.
Effective in both simulation and real robot experiments.
Abstract
Many previous works approach vision-based robotic grasping by training a value network that evaluates grasp proposals. These approaches require an optimization process at run-time to infer the best action from the value network. As a result, the inference time grows exponentially as the dimension of action space increases. We propose an alternative method, by directly training a neural density model to approximate the conditional distribution of successful grasp poses from the input images. We construct a neural network that combines Gaussian mixture and normalizing flows, which is able to represent multi-modal, complex probability distributions. We demonstrate on both simulation and real robot that the proposed actor model achieves similar performance compared to the value network using the Cross-Entropy Method (CEM) for inference, on top-down grasping with a 4 dimensional action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
