Polar Encoding: A Simple Baseline Approach for Classification with Missing Values
Oliver Urs Lenz, Daniel Peralta, Chris Cornelis

TL;DR
Polar encoding offers a simple, effective baseline for handling missing data in classification tasks, preserving missingness information without imputation and performing competitively with advanced methods.
Contribution
The paper introduces polar encoding as a universal, easy-to-apply method that naturally incorporates missingness and unifies categorical and numerical attributes under a barycentric coordinate framework.
Findings
Outperforms MICE and MIDAS in classification accuracy on real datasets.
Does not require imputation, simplifying the data preprocessing pipeline.
Compatible with any classification algorithm, including decision trees.
Abstract
We propose polar encoding, a representation of categorical and numerical -valued attributes with missing values to be used in a classification context. We argue that this is a good baseline approach, because it can be used with any classification algorithm, preserves missingness information, is very simple to apply and offers good performance. In particular, unlike the existing missing-indicator approach, it does not require imputation, ensures that missing values are equidistant from non-missing values, and lets decision tree algorithms choose how to split missing values, thereby providing a practical realisation of the "missingness incorporated in attributes" (MIA) proposal. Furthermore, we show that categorical and -valued attributes can be viewed as special cases of a single attribute type, corresponding to the classical concept of barycentric coordinates, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Text and Document Classification Technologies · Rough Sets and Fuzzy Logic
