A new algorithm for Subgroup Set Discovery based on Information Gain
Daniel G\'omez-Bravo, Aaron Garc\'ia, Guillermo Vigueras, Bel\'en, R\'ios, Alejandro Rodr\'iguez-Gonz\'alez

TL;DR
This paper introduces IGSD, a novel pattern discovery algorithm that combines Information Gain and Odds Ratio, outperforming existing methods in reliability, relevance, and expert validation across multiple datasets.
Contribution
The paper presents IGSD, a new subgroup discovery algorithm that integrates multiple criteria and addresses limitations of prior algorithms, enhancing pattern reliability and interpretability.
Findings
IGSD outperforms FSSD and SSD++ in pattern reliability and quantity.
IGSD yields higher Odds Ratio values, indicating stronger pattern-target dependence.
Patterns from IGSD align better with domain experts' assessments.
Abstract
Pattern discovery is a machine learning technique that aims to find sets of items, subsequences, or substructures that are present in a dataset with a higher frequency value than a manually set threshold. This process helps to identify recurring patterns or relationships within the data, allowing for valuable insights and knowledge extraction. In this work, we propose Information Gained Subgroup Discovery (IGSD), a new SD algorithm for pattern discovery that combines Information Gain (IG) and Odds Ratio (OR) as a multi-criteria for pattern selection. The algorithm tries to tackle some limitations of state-of-the-art SD algorithms like the need for fine-tuning of key parameters for each dataset, usage of a single pattern search criteria set by hand, usage of non-overlapping data structures for subgroup space exploration, and the impossibility to search for patterns by fixing some…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications
