Learning from Limited and Imperfect Data

Harsh Rangwani

arXiv:2507.21205·cs.LG·July 30, 2025

Learning from Limited and Imperfect Data

Harsh Rangwani

PDF

TL;DR

This paper develops practical deep learning algorithms that effectively handle limited, imbalanced, and imperfect real-world data, addressing challenges like long-tail distributions, domain shifts, and limited annotations.

Contribution

It introduces novel methods for generative modeling, regularization, metric optimization, and domain adaptation tailored for imperfect data scenarios.

Findings

01

Improved diversity in generative models for minority classes.

02

Effective generalization of tail classes without explicit data generation.

03

Enhanced domain adaptation with minimal labeled data.

Abstract

The distribution of data in the world (eg, internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well-curated datasets perform suboptimally when used for learning from imperfect datasets with long-tailed imbalances and distribution shifts. To expand the use of deep models, it is essential to overcome the labor-intensive curation process by developing robust algorithms that can learn from diverse, real-world data distributions. Toward this goal, we develop practical algorithms for Deep Neural Networks which can learn from limited and imperfect data present in the real world. This thesis is divided into four segments, each covering a scenario of learning from limited or imperfect data. The first part of the thesis focuses on Learning Generative Models from Long-Tail Data, where we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.