Visual Stability Prediction and Its Application to Manipulation
Wenbin Li, Ale\v{s} Leonardis, Mario Fritz

TL;DR
This paper introduces a data-driven, end-to-end learning approach to predict the physical stability of wooden block towers directly from appearance, bypassing explicit simulation, and demonstrates its effectiveness through synthetic data and human comparison.
Contribution
It proposes a novel deep learning method for stability prediction that learns directly from visual data, contrasting traditional simulation-based approaches.
Findings
The model accurately predicts tower stability from images.
It outperforms traditional simulation methods in speed and flexibility.
The approach enables reasoning about future states for stacking tasks.
Abstract
Understanding physical phenomena is a key competence that enables humans and animals to act and interact under uncertain perception in previously unseen environments containing novel objects and their configurations. Developmental psychology has shown that such skills are acquired by infants from observations at a very early stage. In this paper, we contrast a more traditional approach of taking a model-based route with explicit 3D representations and physical simulation by an {\em end-to-end} approach that directly predicts stability from appearance. We ask the question if and to what extent and quality such a skill can directly be acquired in a data-driven way---bypassing the need for an explicit simulation at run-time. We present a learning-based approach based on simulated data that predicts stability of towers comprised of wooden blocks under different conditions and quantities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Advanced Vision and Imaging · Robotics and Sensor-Based Localization
