Fill-Up: Balancing Long-Tailed Data with Generative Models
Joonghyuk Shin, Minguk Kang, Jaesik Park

TL;DR
This paper introduces a new image synthesis pipeline using Textual Inversion to generate synthetic data that balances long-tailed datasets, significantly improving recognition performance in imbalanced scenarios.
Contribution
It proposes a novel synthesis method with Textual Inversion that effectively aligns generated images with real data, enhancing long-tailed recognition performance.
Findings
Generated images improve recognition accuracy on long-tailed benchmarks.
Synthetic data filling mitigates class imbalance effectively.
Achieves state-of-the-art results when trained from scratch.
Abstract
Modern text-to-image synthesis models have achieved an exceptional level of photorealism, generating high-quality images from arbitrary text descriptions. In light of the impressive synthesis ability, several studies have exhibited promising results in exploiting generated data for image recognition. However, directly supplementing data-hungry situations in the real-world (e.g. few-shot or long-tailed scenarios) with existing approaches result in marginal performance gains, as they suffer to thoroughly reflect the distribution of the real data. Through extensive experiments, this paper proposes a new image synthesis pipeline for long-tailed situations using Textual Inversion. The study demonstrates that generated images from textual-inverted text tokens effectively aligns with the real domain, significantly enhancing the recognition ability of a standard ResNet50 backbone. We also show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Advanced Neural Network Applications
