Webly Supervised Concept Expansion for General Purpose Vision Models
Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek, Hoiem, Aniruddha Kembhavi

TL;DR
This paper introduces a webly supervised approach to expand general purpose vision models by leveraging web images for concept learning, significantly reducing data collection costs while improving performance across multiple visual tasks.
Contribution
It presents a novel webly supervised method for concept expansion in GPVs, introduces the GPV-2 architecture, and demonstrates superior performance on diverse benchmarks.
Findings
GPV-2 outperforms GPV-1 and VL-T5 across benchmarks.
Web data significantly enhances GPV capabilities.
Effective concept expansion reduces data collection costs.
Abstract
General Purpose Vision (GPV) systems are models that are designed to solve a wide array of visual tasks without requiring architectural changes. Today, GPVs primarily learn both skills and concepts from large fully supervised datasets. Scaling GPVs to tens of thousands of concepts by acquiring data to learn each concept for every skill quickly becomes prohibitive. This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills. We use a dataset of 1M+ images spanning 10k+ visual concepts to demonstrate webly-supervised concept expansion for two existing GPVs (GPV-1 and VL-T5) on 3 benchmarks: 5 COCO-based datasets (80 primary concepts), a newly curated series of 5 datasets based on the OpenImages and VisualGenome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsVL-T5
