Data-Centric Debugging: mitigating model failures via targeted data collection
Sahil Singla, Atoosa Malemir Chegini, Mazda Moayeri, Soheil Feiz

TL;DR
This paper introduces Data-Centric Debugging, a method to improve model reliability in deployment scenarios by selecting similar images from noisy data pools to target specific failure modes, enhancing performance on error distributions.
Contribution
The paper proposes a systematic debugging framework that leverages feature similarity and advanced models like DINO ViTs to select relevant data, improving model performance on deployment failures.
Findings
DINO ViTs outperform ResNets in identifying similar images.
The method reduces compute and storage needs by 99.58%.
Achieves +9.45% improvement on debug-heldout sets.
Abstract
Deep neural networks can be unreliable in the real world when the training set does not adequately cover all the settings where they are deployed. Focusing on image classification, we consider the setting where we have an error distribution representing a deployment scenario where the model fails. We have access to a small set of samples from and it can be expensive to obtain additional samples. In the traditional model development framework, mitigating failures of the model in can be challenging and is often done in an ad hoc manner. In this paper, we propose a general methodology for model debugging that can systemically improve model performance on while maintaining its performance on the original test set. Our key assumption is that we have access to a large pool of weakly (noisily) labeled data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsMulti-Head Attention · Attention Is All You Need · Softmax · Layer Normalization · Linear Layer · Dense Connections · Vision Transformer · Test · Batch Normalization · Residual Connection
