VP-GO: a "light" action-conditioned visual prediction model
Anji Ma, Yoann Fleytoux, Jean-Bapstiste Mouret, Serena Ivaldi

TL;DR
VP-GO is a lightweight, stochastic, action-conditioned visual prediction model designed for robotic grasping, offering improved qualitative predictions of complex grasps while maintaining computational efficiency.
Contribution
The paper introduces VP-GO, a novel lightweight stochastic visual prediction model with hierarchical action decomposition and releases a new dataset for robotic grasp prediction.
Findings
Performs comparably to complex models on signal prediction metrics.
Outperforms in qualitative prediction of complex robotic grasps.
Compatible with existing datasets like RoboNet and PandaGrasp.
Abstract
Visual prediction models are a promising solution for visual-based robotic grasping of cluttered, unknown soft objects. Previous models from the literature are computationally greedy, which limits reproducibility; although some consider stochasticity in the prediction model, it is often too weak to catch the reality of robotics experiments involving grasping such objects. Furthermore, previous work focused on elementary movements that are not efficient to reason in terms of more complex semantic actions. To address these limitations, we propose VP-GO, a ``light'' stochastic action-conditioned visual prediction model. We propose a hierarchical decomposition of semantic grasping and manipulation actions into elementary end-effector movements, to ensure compatibility with existing models and datasets for visual prediction of robotic actions such as RoboNet. We also record and release a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Reinforcement Learning in Robotics · Human Pose and Action Recognition
