Auto-TransRL: Autonomous Composition of Vision Pipelines for Robotic   Perception

Aditya Kapoor; Nijil George; Vartika Sengar; Vighnesh Vatsal and; Jayavardhana Gubbi

arXiv:2209.02991·cs.CV·September 8, 2022

Auto-TransRL: Autonomous Composition of Vision Pipelines for Robotic Perception

Aditya Kapoor, Nijil George, Vartika Sengar, Vighnesh Vatsal and, Jayavardhana Gubbi

PDF

Open Access

TL;DR

Auto-TransRL introduces a data-driven, adaptive system using Transformer and Deep Reinforcement Learning to automatically compose vision pipelines for robotic perception, reducing reliance on human expertise and trial-and-error.

Contribution

It presents a novel Transformer-based reinforcement learning framework for automatic construction of vision pipelines, capable of generalizing to unseen algorithms and adapting to environmental changes.

Findings

01

System effectively recommends algorithms for vision tasks.

02

Generalizes well to unseen algorithms during testing.

03

Robust and adaptive to dynamic environments.

Abstract

Creating a vision pipeline for different datasets to solve a computer vision task is a complex and time consuming process. Currently, these pipelines are developed with the help of domain experts. Moreover, there is no systematic structure to construct a vision pipeline apart from relying on experience, trial and error or using template-based approaches. As the search space for choosing suitable algorithms for achieving a particular vision task is large, human exploration for finding a good solution requires time and effort. To address the following issues, we propose a dynamic and data-driven way to identify an appropriate set of algorithms that would be fit for building the vision pipeline in order to achieve the goal task. We introduce a Transformer Architecture complemented with Deep Reinforcement Learning to recommend algorithms that can be incorporated at different stages of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Layer Normalization · Absolute Position Encodings · Adam · Softmax · Residual Connection · Position-Wise Feed-Forward Layer