Random Intersection Trees
Rajen Dinesh Shah, Nicolai Meinshausen

TL;DR
Random Intersection Trees is a novel method for identifying high-order interactions in high-dimensional binary data efficiently, using a top-down approach that retains informative interactions with high probability and reduces computational costs.
Contribution
The paper introduces Random Intersection Trees, a new algorithm that efficiently detects variable interactions in high-dimensional data, outperforming brute-force methods in computational complexity.
Findings
Retains informative interactions with high probability
Computational complexity can be as low as p^1 in sparse data
Uses min-wise hashing to further reduce costs
Abstract
Finding interactions between variables in large and high-dimensional datasets is often a serious computational challenge. Most approaches build up interaction sets incrementally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose at an alternative approach for classification problems with binary predictor variables, called Random Intersection Trees. It works by starting with a maximal interaction that includes all variables, and then gradually removing variables if they fail to appear in randomly chosen observations of a class of interest. We show that informative interactions are retained with high probability, and the computational complexity of our procedure is of order for a value of that can reach values as low as 1 for very sparse data; in many more general settings, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification · Data Management and Algorithms
