Searching for significant patterns in stratified data
Felipe Llinares-Lopez, Laetitia Papaxanthos, Dean Bodenham, Karsten, Borgwardt

TL;DR
This paper introduces a new method and algorithm for significant pattern mining in stratified data, effectively accounting for categorical covariates to improve statistical testing in biomedical research.
Contribution
It presents a novel strategy and efficient algorithm for pattern mining that incorporates categorical covariates, addressing limitations of previous testability-based methods.
Findings
Developed an algorithm for stratified pattern mining
Enhanced statistical power in biomedical data analysis
Successfully tested on real-world datasets
Abstract
Significant pattern mining, the problem of finding itemsets that are significantly enriched in one class of objects, is statistically challenging, as the large space of candidate patterns leads to an enormous multiple testing problem. Recently, the concept of testability was proposed as one approach to correct for multiple testing in pattern mining while retaining statistical power. Still, these strategies based on testability do not allow one to condition the test of significance on the observed covariates, which severely limits its utility in biomedical applications. Here we propose a strategy and an efficient algorithm to perform significant pattern mining in the presence of categorical covariates with K states.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Genetic Associations and Epidemiology · Genetic Mapping and Diversity in Plants and Animals
