TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation

Yuhui Zhang; Dongshen Wu; Yuichiro Wada; Takafumi Kanamori

arXiv:2505.16923·stat.ML·May 26, 2025

TULiP: Test-time Uncertainty Estimation via Linearization and Weight Perturbation

Yuhui Zhang, Dongshen Wu, Yuichiro Wada, Takafumi Kanamori

PDF

3 Reviews

TL;DR

TULiP is a post-hoc uncertainty estimation method for out-of-distribution detection that leverages linearization and weight perturbation to provide reliable uncertainty scores, achieving state-of-the-art results.

Contribution

It introduces a theoretically-driven approach to estimate uncertainty by analyzing network perturbations based on linearized training dynamics, applicable post-training.

Findings

01

Achieves state-of-the-art OOD detection performance.

02

Effective for near-distribution samples.

03

Visualized bounds on synthetic datasets.

Abstract

A reliable uncertainty estimation method is the foundation of many modern out-of-distribution (OOD) detectors, which are critical for safe deployments of deep learning models in the open world. In this work, we propose TULiP, a theoretically-driven post-hoc uncertainty estimator for OOD detection. Our approach considers a hypothetical perturbation applied to the network before convergence. Based on linearized training dynamics, we bound the effect of such perturbation, resulting in an uncertainty score computable by perturbing model parameters. Ultimately, our approach computes uncertainty from a set of sampled predictions. We visualize our bound on synthetic regression and classification datasets. Furthermore, we demonstrate the effectiveness of TULiP using large-scale OOD detection benchmarks for image classification. Our method exhibits state-of-the-art performance, particularly for…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 2

Strengths

This paper is overall well-written. The idea is theoretically driven and offers good interpretability. The method is thoroughly evaluated and demonstrates excellent performance compared to many OOD detection methods.

Weaknesses

1.What's the motivation for calculating the upperbound of variations for uncertainty quantification? As shown in Eq 1. The objective is to estimate the variance given an different parameters initializations. To solve this, the DNN is first linearized locally with the NTK theory and the upperbound for introducing the changes are calculated with the NTK theory. The paradox is if the parameters can be already be perturbed, why NTK is needed for calculating the upperbound. Besides, calculating the

Reviewer 02Rating 6Confidence 3

Strengths

* The paper is clearly written and experiments have been extensively conducted across a set of diverse datasets. * Theoretical analysis is thorough.

Weaknesses

* There are so many hyperparameters that implementing the method in realistic scenario may have some difficulties. * Performance on far OOD is not good enough.

Reviewer 03Rating 5Confidence 4

Strengths

- This work proposes an uncertainty-based score to detect both semantic-shift OOD and covariate-shift OOD without accessing the training data. - The derivation of the proposed bound for ||f_T(x)-\hat{f}_T(x)|| and its upper bound are written down thoroughly.

Weaknesses

- Line 018: Could the authors clarify what is meant by “other problem settings”? - Connection Between Concepts (Line72-73): The relationship between semantic shift and covariate shift in Out-of-Distribution (OOD) detection and epistemic uncertainty is not clearly motivated [1]. Could the authors provide further elaboration on this connection? - Post-hoc OOD Detectors: The related work section appears somewhat outdated and incomplete. A notable aspect of TULiP is its ability to perform OOD dete

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSparse Evolutionary Training