Modified Entropy Measure for Detection of Association Rules Under Simpson's Paradox Context
Murphy Choy, Cally Claire Ong, Michelle Cheong

TL;DR
This paper introduces a modified entropy-based measure to detect hidden association rules in geographically and demographically diverse retail data, addressing challenges posed by Simpson's paradox in traditional models.
Contribution
It proposes a novel entropy measure with modified weighting to uncover association rules obscured by diversity effects, improving analysis in heterogeneous retail datasets.
Findings
Effective detection of hidden association rules demonstrated
Improved understanding of diversity effects on rule discovery
Case study confirms practical utility of the proposed measure
Abstract
The rapid explosion in retail data calls for more effective and efficient discovery of association rules to develop relevant business strategies and rules.Unlike online shopping sites, most brick and mortar retail shops are located in geographically and demographically diverse areas. This diversity presents a new challenge to the classical association rule model which assumes a homogenous group of customers behaving differently. The focus of this paper is centered on the discovery of association rules that were hidden as a result of a geographical and demographically diverse data. We will introduce a novel measure which incorporates the entropy measure with modified weighting for the detection of association rules not detected by the standard measures due to Simpson's paradox. The proposed measure is evaluated using a real-word case study involving a major retailer of fashion good in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Imbalanced Data Classification Techniques
