Tight Rates in Supervised Outlier Transfer Learning

Mohammadreza M. Kalan; Samory Kpotufe

arXiv:2310.04686·cs.LG·October 10, 2023·1 cites

Tight Rates in Supervised Outlier Transfer Learning

Mohammadreza M. Kalan, Samory Kpotufe

PDF

Open Access 3 Reviews

TL;DR

This paper investigates the theoretical limits and potential of transfer learning for outlier detection, showing that even dissimilar sources can be beneficial and that adaptive methods can achieve optimal transfer.

Contribution

It establishes the information-theoretic limits of outlier transfer learning and demonstrates that adaptive procedures can attain these limits without prior knowledge of source-target discrepancy.

Findings

01

Seemingly dissimilar sources can provide significant information for transfer.

02

Information-theoretic limits of transfer are characterized under an extended discrepancy measure.

03

Adaptive procedures can achieve these limits without prior discrepancy knowledge.

Abstract

A critical barrier to learning an accurate decision rule for outlier detection is the scarcity of outlier data. As such, practitioners often turn to the use of similar but imperfect outlier data from which they might transfer information to the target outlier detection task. Despite the recent empirical success of transfer learning approaches in outlier detection, a fundamental understanding of when and how knowledge can be transferred from a source to a target outlier detection task remains elusive. In this work, we adopt the traditional framework of Neyman-Pearson classification -- which formalizes supervised outlier detection -- with the added assumption that one has access to some related but imperfect outlier data. Our main results are as follows: We first determine the information-theoretic limits of the problem under a measure of discrepancy that extends some existing notions…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 8· accept, good paperConfidence 4

Strengths

1. The outlier detection in transfer learning is an interesting and valuable topic in the learning community. 2. The literature part is very clear. 3. The structure of the paper is easy to follow. 4. The setup of the paper is clear 5. The paper provided solid theoretic results on the minimax bounds and rates.

Weaknesses

1. Only finite-sample results are provided. There is no further analysis of asymptotic properties on the large dataset.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The paper is very well-presented. - The theory is compelling and elegant. - I think that the "same optimal classifier" setting between source and target distributions seems unrealistic (e.g., the setting of Figure 1) but I can see why from a theoretical standpoint, analyzing this simpler setting is a good starting point and already there are interesting insights, especially in contrasting this outlier setup to traditional classification. - The extension of the transfer exponent to the outlier

Weaknesses

- As far as I can tell, this paper does not actually follow the ICLR LaTeX template. For instance, the margins don't appear correct? Please fix this. - There are no numerical experiments. I think this paper would improve dramatically with experimental results, especially on real data, and especially on showing how well the adaptive method in Section 4.8 works in practice. - Detailed discussion of how applied researchers address this outlier transfer problem in practice would be helpful to provid

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The paper studied an important practical problem.

Weaknesses

(1) A lower bound on the target-excess error is not as informative as an upper bound. Is it possible to derive an upper bound on the target-excess error under appropriate conditions? (2) The algorithm proposed in Section 4.8 requires as input the VC dimension of the hypothesis class. However, in practice, the exact VC dimension may be unknown. Could you please give some practical suggestions on using this algorithm when the exact VC dimension is unknown? (3) The notation in inequality (4.1) is

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Water Systems and Optimization · Domain Adaptation and Few-Shot Learning