Tree-based Ensemble Learning for Out-of-distribution Detection
Zhaiming Shen, Menglun Wang, Guang Cheng, Ming-Jun Lai, Lin Mu, Ruihao, Huang, Qi Liu, Hao Zhu

TL;DR
This paper introduces TOOD detection, a tree-based method for out-of-distribution detection that uses pairwise Hamming distances of tree embeddings to effectively distinguish in-distribution from out-of-distribution samples across multiple data types.
Contribution
The paper presents a novel, interpretable, and efficient tree-based out-of-distribution detection method that generalizes well to various data modalities and unsupervised settings.
Findings
Outperforms state-of-the-art OOD detection methods on tabular, image, and text data.
Provides a robust and interpretable approach based on tree embeddings.
Efficient and flexible for different machine learning tasks.
Abstract
Being able to successfully determine whether the testing samples has similar distribution as the training samples is a fundamental question to address before we can safely deploy most of the machine learning models into practice. In this paper, we propose TOOD detection, a simple yet effective tree-based out-of-distribution (TOOD) detection mechanism to determine if a set of unseen samples will have similar distribution as of the training samples. The TOOD detection mechanism is based on computing pairwise hamming distance of testing samples' tree embeddings, which are obtained by fitting a tree-based ensemble model through in-distribution training samples. Our approach is interpretable and robust for its tree-based nature. Furthermore, our approach is efficient, flexible to various machine learning tasks, and can be easily generalized to unsupervised setting. Extensive experiments are…
Peer Reviews
Decision·Submitted to ICLR 2024
Novel approach: The paper introduces a new mechanism, TOOD detection, which offers a novel perspective on addressing the problem of determining whether testing samples have a similar distribution to training samples by a tree based ensemble method. From my personal knowledge, tree structures and ensemble methods are seldomly studied in OOD detection, making the considered direction an interesting line of works. Effective methodology: The proposed TOOD detection mechanism based on computing pa
The authors define OOD in the abstract, but such a definition may violate the main stream of the community. In my view, telling the difference between two distributions is more related to two sample test. While in OOD Detection, we typically assume the ID and OOD distribution has been mixed, thus we need to tell data as ID and OOD cases instance/point wise. I think such a setting is more difficult than two sample test, making OOD detection remain a challenging task in the literature. It will be
- The proposed detection method is novel and interesting. - There is diversity in the experimental setup, considering OOD detection on multiple data types. - Theoretical analysis is provided.
- This method is not valid for high-dimensional inputs. - There are no experiments on ImageNet benchmark. - The results of the theoretical analysis are for a single classification tree model, not for a random forest.
The strengths of this paper can be summarized as follows: 1. **Experimental Results:** The experiments demonstrated the proposal's strong performance across various synthetic and benchmark datasets. 2. **Clarity and Presentation:** The paper is meticulously structured, and the ideas presented are easily comprehensible, ensuring accessibility for readers.
While this paper exhibits several strengths, it also presents several weaknesses, which are outlined as follows: 1. **Limitation 1**: The idea that "out-of-distribution data may exhibit smaller Hamming distances among themselves" hinges on the assumption that the support of training and testing distributions does not overlap in each dimension. However, this assumption raises doubts as it prohibits anomalies from occurring in only one dimension. 2. **Limitation 2**: The central idea appears to
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Smart Grid Security and Resilience · Data Stream Mining Techniques
MethodsSparse Evolutionary Training
