Visionary: Vision architecture discovery for robot learning
Iretiayo Akinola, Anelia Angelova, Yao Lu, Yevgen Chebotar, Dmitry, Kalashnikov, Jacob Varley, Julian Ibarz, Michael S. Ryoo

TL;DR
Visionary introduces a novel architecture search method that automatically designs neural networks for robot manipulation, improving success rates and grasping performance through learned attention mechanisms.
Contribution
It presents the first successful neural architecture and attention connectivity search tailored for real-robot manipulation tasks.
Findings
Achieves higher task success rates compared to baselines.
Improves grasping performance by 6% on real robots.
Demonstrates effective architecture discovery during training.
Abstract
We propose a vision-based architecture search algorithm for robot manipulation learning, which discovers interactions between low dimension action inputs and high dimensional visual inputs. Our approach automatically designs architectures while training on the task - discovering novel ways of combining and attending image feature representations with actions as well as features from previous layers. The obtained new architectures demonstrate better task success rates, in some cases with a large margin, compared to a recent high performing baseline. Our real robot experiments also confirm that it improves grasping performance by 6%. This is the first approach to demonstrate a successful neural architecture search and attention connectivity search for a real-robot task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
