TL;DR
UberNet is a unified CNN architecture capable of performing multiple low-, mid-, and high-level vision tasks simultaneously, trained efficiently on diverse datasets with limited memory, achieving competitive results in real-time.
Contribution
This work introduces UberNet, a novel end-to-end trainable CNN that handles a wide range of vision tasks within a single model, addressing training on diverse datasets and memory constraints.
Findings
Handles multiple vision tasks simultaneously
Achieves real-time performance (~0.7 seconds per frame)
Maintains competitive accuracy across tasks
Abstract
In this work we introduce a convolutional neural network (CNN) that jointly handles low-, mid-, and high-level vision tasks in a unified architecture that is trained end-to-end. Such a universal network can act like a `swiss knife' for vision tasks; we call this architecture an UberNet to indicate its overarching nature. We address two main technical challenges that emerge when broadening up the range of tasks handled by a single CNN: (i) training a deep architecture while relying on diverse training sets and (ii) training many (potentially unlimited) tasks with a limited memory budget. Properly addressing these two problems allows us to train accurate predictors for a host of tasks, without compromising accuracy. Through these advances we train in an end-to-end manner a CNN that simultaneously addresses (a) boundary detection (b) normal estimation (c) saliency estimation (d)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
