DI2: prior-free and multi-item discretization ofbiomedical data and its   applications

Leonardo Alexandre; Rafael S. Costa; Rui Henriques

arXiv:2103.04356·q-bio.QM·March 9, 2021

DI2: prior-free and multi-item discretization ofbiomedical data and its applications

Leonardo Alexandre, Rafael S. Costa, Rui Henriques

PDF

Open Access 1 Repo

TL;DR

DI2 is an unsupervised, prior-free discretization method for biomedical data that handles skewed distributions and supports multi-item assignments, improving upon existing approaches without relying on strict statistical assumptions.

Contribution

The paper introduces DI2, a novel discretization approach that is prior-free, handles skewed distributions, and supports multi-item assignments, addressing limitations of existing methods.

Findings

01

DI2 improves discretization accuracy on biomedical datasets.

02

It provides robust generalization guarantees using the Kolmogorov-Smirnov test.

03

DI2 outperforms traditional discretization methods in diverse biomedical scenarios.

Abstract

Motivation: A considerable number of data mining approaches for biomedical data analysis, including state-of-the-art associative models, require a form of data discretization. Although diverse discretization approaches have been proposed, they generally work under a strict set of statistical assumptions which are arguably insufficient to handle the diversity and heterogeneity of clinical and molecular variables within a given dataset. In addition, although an increasing number of symbolic approaches in bioinformatics are able to assign multiple items to values occurring near discretization boundaries for superior robustness, there are no reference principles on how to perform multi-item discretizations. Results: In this study, an unsupervised discretization method, DI2, for variables with arbitrarily skewed distributions is proposed. DI2 provides robust guarantees of generalization by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JupitersMight/DI2
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Rough Sets and Fuzzy Logic · Machine Learning and Data Classification