Identifying and Mitigating Model Failures through Few-shot CLIP-aided Diffusion Generation
Atoosa Chegini, Soheil Feizi

TL;DR
This paper introduces an end-to-end framework leveraging large language models and CLIP to automatically identify and generate synthetic data for failure modes in deep learning models, significantly improving accuracy on challenging sub-populations.
Contribution
The study presents a novel method combining language and vision models to automatically describe failure modes and generate synthetic data, enhancing model robustness in a few-shot setting.
Findings
Achieved approximately 21% accuracy improvement on hard sub-populations.
Demonstrated effectiveness across 40 different models and multiple datasets.
Enabled automatic failure mode analysis without human intervention.
Abstract
Deep learning models can encounter unexpected failures, especially when dealing with challenging sub-populations. One common reason for these failures is the occurrence of objects in backgrounds that are rarely seen during training. To gain a better understanding of these failure modes, human-interpretable descriptions are crucial for further analysis and improvement which is expensive. In this study, we propose an end-to-end framework that utilizes the capabilities of large language models (ChatGPT) and vision-language deep models (CLIP) to generate text descriptions of failure modes associated with spurious correlations (e.g. rarely seen backgrounds) without human-in-the-loop intervention. These descriptions can be used to generate synthetic data using generative models, such as diffusion models. The model can now use this generated data to learn from its weaknesses and enhance its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Attention Is All You Need · Dropout · Linear Layer · Dense Connections · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Diffusion · Label Smoothing · Byte Pair Encoding
