Separate and conquer heuristic allows robust mining of contrast sets in classification, regression, and survival data
Adam Gudy\'s, Marek Sikora, {\L}ukasz Wr\'obel

TL;DR
This paper introduces RuleKit-CS, a novel contrast set mining algorithm based on separate and conquer, capable of handling classification, regression, and survival data, with demonstrated effectiveness across diverse datasets.
Contribution
The paper presents a generalized contrast set mining algorithm that extends separate and conquer to regression and survival data, incorporating attribute penalization for better group differentiation.
Findings
Effective in discovering differences across various data types
Validated on over 130 datasets from multiple domains
Available as open-source software for broader use
Abstract
Identifying differences between groups is one of the most important knowledge discovery problems. The procedure, also known as contrast sets mining, is applied in a wide range of areas like medicine, industry, or economics. In the paper we present RuleKit-CS, an algorithm for contrast set mining based on separate and conquer - a well established heuristic for decision rule induction. Multiple passes accompanied with an attribute penalization scheme provide contrast sets describing same examples with different attributes, distinguishing presented approach from the standard separate and conquer. The algorithm was also generalized for regression and survival data allowing identification of contrast sets whose label attribute/survival prognosis is consistent with the label/prognosis for the predefined contrast groups. This feature, not provided by the existing approaches, further extends…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic
