Beyond Toy Benchmarks: A Systematic Evaluation of OOD Detection Methods For Plant Pathology Classification
Devesh Shah

TL;DR
This paper systematically evaluates OOD detection methods on a real-world plant pathology dataset, highlighting the effectiveness of energy-based fine-tuning and practical training challenges.
Contribution
It provides a comprehensive assessment of multiple OOD detection techniques on a complex, real-world dataset, revealing their strengths and limitations.
Findings
Energy-based fine-tuning improves OOD detection and maintains in-distribution accuracy.
Embedding space restructuring and calibration are key to detection gains.
Scaling constrained optimization methods introduces training instabilities.
Abstract
Out-of-distribution (OOD) detection is essential for reliable deployment of deep learning systems, yet the majority of existing methods are evaluated on small, visually homogeneous benchmarks. In this work, we study six OOD detection methods spanning post-hoc scoring, auxiliary objectives, energy-based models, and constrained optimization on the Plant Pathology 2021 dataset, a fine-grained task with natural distribution shifts. Energy-based fine-tuning performs best across OOD settings, improving detection over the softmax baseline while preserving in-distribution accuracy. Analysis shows these gains stem from both a restructuring of the embedding space alongside calibration of the scoring function. We further document practical training instabilities that arise when scaling constrained optimization methods to moderate-sized datasets, findings that are largely absent from existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
