On the Effectiveness of Discretizing Quantitative Attributes in Linear   Classifiers

Nayyar A. Zaidi; Yang Du; Geoffrey I. Webb

arXiv:1701.07114·cs.LG·January 26, 2017·5 cites

On the Effectiveness of Discretizing Quantitative Attributes in Linear Classifiers

Nayyar A. Zaidi, Yang Du, Geoffrey I. Webb

PDF

Open Access 1 Repo

TL;DR

Discretizing quantitative attributes can significantly improve the accuracy of linear classifiers by reducing their representation bias, especially on large datasets, as demonstrated through empirical analysis on multiple benchmarks.

Contribution

This study systematically evaluates how discretization enhances various linear classifiers' performance, extending previous findings from naive Bayes to logistic regression, SVMs, and neural networks.

Findings

01

Discretization greatly improves classifier accuracy on large datasets.

02

Linear classifiers benefit from reduced representation bias due to discretization.

03

Empirical results on 42 benchmark datasets support the effectiveness of discretization.

Abstract

Learning algorithms that learn linear models often have high representation bias on real-world problems. In this paper, we show that this representation bias can be greatly reduced by discretization. Discretization is a common procedure in machine learning that is used to convert a quantitative attribute into a qualitative one. It is often motivated by the limitation of some learners to qualitative data. Discretization loses information, as fewer distinctions between instances are possible using discretized data relative to undiscretized data. In consequence, where discretization is not essential, it might appear desirable to avoid it. However, it has been shown that discretization often substantially reduces the error of the linear generative Bayesian classifier naive Bayes. This motivates a systematic study of the effectiveness of discretizing quantitative attributes for other linear…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vedic-partap/Discretization
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Statistical and Computational Modeling · Data Mining Algorithms and Applications

MethodsLogistic Regression