Learning High-Order Interactions via Targeted Pattern Search
Michela C. Massi, Nicola R. Franco, Francesca Ieva, Andrea Manzoni,, Anna Maria Paganoni, Paolo Zunino

TL;DR
This paper introduces LIPS, a novel algorithm that efficiently selects high-order interaction terms for logistic regression models in imbalanced categorical data scenarios, improving predictive accuracy and interpretability.
Contribution
LIPS combines frequent item set mining with a dissimilarity-based selection to effectively include high-order interactions in logistic regression for complex, imbalanced datasets.
Findings
LIPS outperforms state-of-the-art algorithms in real-world tests.
The method effectively handles high-dimensional categorical data.
Variants of LIPS address specific research needs.
Abstract
Logistic Regression (LR) is a widely used statistical method in empirical binary classification studies. However, real-life scenarios oftentimes share complexities that prevent from the use of the as-is LR model, and instead highlight the need to include high-order interactions to capture data variability. This becomes even more challenging because of: (i) datasets growing wider, with more and more variables; (ii) studies being typically conducted in strongly imbalanced settings; (iii) samples going from very large to extremely small; (iv) the need of providing both predictive models and interpretable results. In this paper we present a novel algorithm, Learning high-order Interactions via targeted Pattern Search (LIPS), to select interaction terms of varying order to include in a LR model for an imbalanced binary classification task when input data are categorical. LIPS's rationale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Genetic Associations and Epidemiology
