Fill-Up: Balancing Long-Tailed Data with Generative Models

Joonghyuk Shin; Minguk Kang; Jaesik Park

arXiv:2306.07200·cs.CV·June 13, 2023·5 cites

Fill-Up: Balancing Long-Tailed Data with Generative Models

Joonghyuk Shin, Minguk Kang, Jaesik Park

PDF

Open Access

TL;DR

This paper introduces a new image synthesis pipeline using Textual Inversion to generate synthetic data that balances long-tailed datasets, significantly improving recognition performance in imbalanced scenarios.

Contribution

It proposes a novel synthesis method with Textual Inversion that effectively aligns generated images with real data, enhancing long-tailed recognition performance.

Findings

01

Generated images improve recognition accuracy on long-tailed benchmarks.

02

Synthetic data filling mitigates class imbalance effectively.

03

Achieves state-of-the-art results when trained from scratch.

Abstract

Modern text-to-image synthesis models have achieved an exceptional level of photorealism, generating high-quality images from arbitrary text descriptions. In light of the impressive synthesis ability, several studies have exhibited promising results in exploiting generated data for image recognition. However, directly supplementing data-hungry situations in the real-world (e.g. few-shot or long-tailed scenarios) with existing approaches result in marginal performance gains, as they suffer to thoroughly reflect the distribution of the real data. Through extensive experiments, this paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion. The study demonstrates that generated images from textual-inverted text tokens effectively aligns with the real domain, significantly enhancing the recognition ability of a standard ResNet50 backbone. We also show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications