DI2: prior-free and multi-item discretization ofbiomedical data and its applications
Leonardo Alexandre, Rafael S. Costa, Rui Henriques

TL;DR
DI2 is an unsupervised, prior-free discretization method for biomedical data that handles skewed distributions and supports multi-item assignments, improving upon existing approaches without relying on strict statistical assumptions.
Contribution
The paper introduces DI2, a novel discretization approach that is prior-free, handles skewed distributions, and supports multi-item assignments, addressing limitations of existing methods.
Findings
DI2 improves discretization accuracy on biomedical datasets.
It provides robust generalization guarantees using the Kolmogorov-Smirnov test.
DI2 outperforms traditional discretization methods in diverse biomedical scenarios.
Abstract
Motivation: A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations. Results: In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. DI2 provides robust guarantees of generalization by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Machine Learning and Data Classification
