Learning from Limited and Imperfect Data
Harsh Rangwani

TL;DR
This paper develops practical algorithms enabling deep neural networks to learn effectively from limited, imbalanced, and imperfect real-world data, addressing challenges posed by long-tailed distributions and domain shifts.
Contribution
It introduces four novel approaches for learning from limited and imperfect data, including generative models, regularization schemes, metric optimization, and domain adaptation techniques.
Findings
Improved diversity in generative models for tail classes.
Enhanced generalization of tail classes without explicit image generation.
Effective semi-supervised learning and domain adaptation with minimal labels.
Abstract
The datasets used for Deep Neural Network training (e.g., ImageNet, MSCOCO, etc.) are often manually balanced across categories (classes) to facilitate learning of all the categories. This curation process is often expensive and requires throwing away precious annotated data to balance the frequency across classes. This is because the distribution of data in the world (e.g., internet, etc.) significantly differs from the well-curated datasets and is often over-populated with samples from common categories. The algorithms designed for well-curated datasets perform suboptimally when used to learn from imperfect datasets with long-tailed imbalances and distribution shifts. For deep models to be widely used, getting away with the costly curation process by developing robust algorithms that can learn from real-world data distribution is necessary. Toward this goal, we develop practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Statistics Education and Methodologies
