VLMine: Long-Tail Data Mining with Vision Language Models

Mao Ye; Gregory P. Meyer; Zaiwei Zhang; Dennis Park; Siva Karthik; Mustikovela; Yuning Chai; Eric M Wolff

arXiv:2409.15486·cs.CV·September 25, 2024

VLMine: Long-Tail Data Mining with Vision Language Models

Mao Ye, Gregory P. Meyer, Zaiwei Zhang, Dennis Park, Siva Karthik, Mustikovela, Yuning Chai, Eric M Wolff

PDF

Open Access

TL;DR

This paper introduces VLMine, a scalable data mining method using vision language models to identify long-tail, rare examples in unlabeled data, improving performance on diverse image and 3D detection benchmarks.

Contribution

The work presents a novel approach leveraging vision language models for long-tail data mining, integrating multiple signals, and demonstrating transferability across 2D and 3D tasks.

Findings

01

Achieves 10-50% improvements over baselines on benchmarks.

02

VLM provides a distinct signal for rare example detection.

03

Method is effective across 2D and 3D data domains.

Abstract

Ensuring robust performance on long-tail examples is an important problem for many real-world applications of machine learning, such as autonomous driving. This work focuses on the problem of identifying rare examples within a corpus of unlabeled data. We propose a simple and scalable data mining approach that leverages the knowledge contained within a large vision language model (VLM). Our approach utilizes a VLM to summarize the content of an image into a set of keywords, and we identify rare examples based on keyword frequency. We find that the VLM offers a distinct signal for identifying long-tail examples when compared to conventional methods based on model uncertainty. Therefore, we propose a simple and general approach for integrating signals from multiple mining algorithms. We evaluate the proposed method on two diverse tasks: 2D image classification, in which inter-class…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Text and Document Classification Technologies · Data Management and Algorithms

MethodsSparse Evolutionary Training