Forte : Finding Outliers with Representation Typicality Estimation
Debargha Ganguly, Warren Morningstar, Andrew Yu, Vipin Chaudhary

TL;DR
Forte introduces a novel representation-based method for out-of-distribution detection that outperforms existing unsupervised approaches by estimating data typicality through manifold-based summary statistics.
Contribution
The paper proposes a new approach leveraging representation learning and manifold estimation to improve OOD detection, addressing limitations of likelihood-based methods.
Findings
Outperforms existing unsupervised OOD detection methods
Achieves state-of-the-art results on benchmark datasets
Effective in synthetic data detection tasks
Abstract
Generative models can now produce photorealistic synthetic data which is virtually indistinguishable from the real data used to train it. This is a significant evolution over previous models which could produce reasonable facsimiles of the training data, but ones which could be visually distinguished from the training data by human evaluation. Recent work on OOD detection has raised doubts that generative model likelihoods are optimal OOD detectors due to issues involving likelihood misestimation, entropy in the generative process, and typicality. We speculate that generative OOD detectors also failed because their models focused on the pixels rather than the semantic content of the data, leading to failures in near-OOD cases where the pixels may be similar but the information content is significantly different. We hypothesize that estimating typical sets using self-supervised learners…
Peer Reviews
Decision·ICLR 2025 Poster
The paper is very well written, easy to follow, and highly implementable. An extensive appendix provides supporting information and data.
None
[The following is based on my guess of the proposed method, which is not well described in paper.] + Proposed approach is a simple change over feature-space OOD methods, and appears effective. + Experiments seems cover a wide range of scenarios
+ The paper is extremely poorly written. I list some major issues here. 1. None of the math latex in section 3.2 is well formatted. Subscripts and superscripts are wrong. 2. Variables are used without definition, e.g., $\text{nearest}_k$ in section 3.2. Is it different from the $k$ below? 3. No description is given on how the four metrics are used. Are they used as the "summary statistics" that the proposed method models? 4. The method, referred to as "Forte", is never truly defi
- **Performance**: Forte demonstrates superior OOD detection performance compared to state-of-the-art methods across multiple benchmarks, including synthetic data and medical image datasets, which often present significant OOD detection challenges. - **Flexibility**: Forte’s unsupervised nature eliminates the need for labeled data or pre-exposure to OOD samples, making it adaptable to various tasks and practical for real-world applications where OOD examples may not be available during training.
1. **Paper Structure**: The paper allocates a substantial portion of its Introduction to reviewing existing OOD detection literature and explaining the typicality concept. This approach detracts from an immediate focus on the novel contributions and design of Forte, which may hinder reader engagement and understanding of the primary contributions. 2. **Complexity in Practical Implementation**: The integration of multiple representation learning techniques, combined with non-parametric density es
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models
