Deep Instance Segmentation and Visual Servoing to Play Jenga with a Cost-Effective Robotic System
Luca Marchionna, Giulio Pugliese, Mauro Martini, Simone Angarano,, Francesco Salvetti, Marcello Chiaberge

TL;DR
This paper presents a cost-effective robotic system using deep learning and visual-tactile integration to play Jenga, demonstrating precise block extraction with a standard manipulator and inexpensive sensors.
Contribution
The authors develop a novel architecture combining deep instance segmentation, visual control, and force sensing for robotic Jenga, using low-cost components and synthetic training data.
Findings
Achieved up to 14 consecutive block extractions.
Successfully integrated deep learning with tactile sensing for manipulation.
Demonstrated precise block removal with a standard robotic arm.
Abstract
The game of Jenga represents an inspiring benchmark for developing innovative manipulation solutions for complex tasks. Indeed, it encouraged the study of novel robotics methods to successfully extract blocks from the tower. A Jenga game round undoubtedly embeds many traits of complex industrial or surgical manipulation tasks, requiring a multi-step strategy, the combination of visual and tactile data, and the highly precise motion of the robotic arm to perform a single block extraction. In this work, we propose a novel, cost-effective architecture for playing Jenga with e.Do, a 6-DOF anthropomorphic manipulator manufactured by Comau, a standard depth camera, and an inexpensive monodirectional force sensor. Our solution focuses on a visual-based control strategy to accurately align the end-effector with the desired block, enabling block extraction by pushing. To this aim, we train an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Tactile and Sensory Interactions · Advanced Vision and Imaging
MethodsALIGN
