# Multi-Method Ensemble for Out-of-Distribution Detection

**Authors:** Lucas Rakotoarivony

arXiv: 2508.21463 · 2025-09-01

## TL;DR

This paper introduces a Multi-Method Ensemble (MME) score that combines multiple feature truncation and scoring techniques to improve out-of-distribution detection in neural networks, demonstrating significant performance gains across various benchmarks.

## Contribution

The paper proposes a novel ensemble approach that unifies and combines multiple state-of-the-art OOD detection methods, enhancing robustness and accuracy.

## Key findings

- MME outperforms recent methods on large-scale benchmarks.
- Achieves an average FPR95 of 27.57% on ImageNet-1K.
- Combining multiple techniques improves OOD detection robustness.

## Abstract

Detecting out-of-distribution (OOD) samples is essential for neural networks operating in open-world settings, particularly in safety-critical applications. Existing methods have improved OOD detection by leveraging two main techniques: feature truncation, which increases the separation between in-distribution (ID) and OOD samples, and scoring functions, which assign scores to distinguish between ID and OOD data. However, most approaches either focus on a single family of techniques or evaluate their effectiveness on a specific type of OOD dataset, overlooking the potential of combining multiple existing solutions. Motivated by this observation, we theoretically and empirically demonstrate that state-of-the-art feature truncation and scoring functions can be effectively combined. Moreover, we show that aggregating multiple scoring functions enhances robustness against various types of OOD samples. Based on these insights, we propose the Multi-Method Ensemble (MME) score, which unifies state-of-the-art OOD detectors into a single, more effective scoring function. Extensive experiments on both large-scale and small-scale benchmarks, covering near-OOD and far-OOD scenarios, show that MME significantly outperforms recent state-of-the-art methods across all benchmarks. Notably, using the BiT model, our method achieves an average FPR95 of 27.57% on the challenging ImageNet-1K benchmark, improving performance by 6% over the best existing baseline.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21463/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21463/full.md

## References

40 references — full list in the complete paper: https://tomesphere.com/paper/2508.21463/full.md

---
Source: https://tomesphere.com/paper/2508.21463