Auto-TransRL: Autonomous Composition of Vision Pipelines for Robotic Perception
Aditya Kapoor, Nijil George, Vartika Sengar, Vighnesh Vatsal and, Jayavardhana Gubbi

TL;DR
Auto-TransRL introduces a data-driven, adaptive system using Transformer and Deep Reinforcement Learning to automatically compose vision pipelines for robotic perception, reducing reliance on human expertise and trial-and-error.
Contribution
It presents a novel Transformer-based reinforcement learning framework for automatic construction of vision pipelines, capable of generalizing to unseen algorithms and adapting to environmental changes.
Findings
System effectively recommends algorithms for vision tasks.
Generalizes well to unseen algorithms during testing.
Robust and adaptive to dynamic environments.
Abstract
Creating a vision pipeline for different datasets to solve a computer vision task is a complex and time consuming process. Currently, these pipelines are developed with the help of domain experts. Moreover, there is no systematic structure to construct a vision pipeline apart from relying on experience, trial and error or using template-based approaches. As the search space for choosing suitable algorithms for achieving a particular vision task is large, human exploration for finding a good solution requires time and effort. To address the following issues, we propose a dynamic and data-driven way to identify an appropriate set of algorithms that would be fit for building the vision pipeline in order to achieve the goal task. We introduce a Transformer Architecture complemented with Deep Reinforcement Learning to recommend algorithms that can be incorporated at different stages of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Test · Linear Layer · Layer Normalization · Absolute Position Encodings · Adam · Softmax · Residual Connection · Position-Wise Feed-Forward Layer
