Assessing Model Generalization in Vicinity
Yuchi Liu, Yifan Sun, Jingdong Wang, Liang Zheng

TL;DR
This paper introduces the vicinal risk proxy (VRP), a label-free method that assesses model generalization on out-of-distribution data by leveraging neighboring test sample responses, improving correlation with true accuracy.
Contribution
The paper proposes VRP, a novel label-free approach that incorporates neighboring sample responses to better estimate model accuracy on out-of-distribution data.
Findings
VRP improves correlation with actual accuracy over existing metrics.
Applying VRP to confidence and invariance metrics enhances their predictive power.
The method is effective on challenging out-of-distribution test sets.
Abstract
This paper evaluates the generalization ability of classification models on out-of-distribution test sets without depending on ground truth labels. Common approaches often calculate an unsupervised metric related to a specific model property, like confidence or invariance, which correlates with out-of-distribution accuracy. However, these metrics are typically computed for each test sample individually, leading to potential issues caused by spurious model responses, such as overly high or low confidence. To tackle this challenge, we propose incorporating responses from neighboring test samples into the correctness assessment of each individual sample. In essence, if a model consistently demonstrates high correctness scores for nearby samples, it increases the likelihood of correctly predicting the target sample, and vice versa. The resulting scores are then averaged across all test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Machine Learning and Data Classification · Topic Modeling
