TL;DR
This paper introduces the Quantile Encoder, a novel method for encoding high-cardinality categorical features in regression tasks, which improves performance especially with skewed distributions by using quantiles and smoothing.
Contribution
The paper proposes a specialized quantile-based encoding technique for categorical features in regression, outperforming traditional encoders and including methods to reduce overfitting.
Findings
Outperforms mean target encoder in MAE
Effective with skewed and long-tailed distributions
Enhanced with feature expansion using multiple quantiles
Abstract
Regression problems have been widely studied in machinelearning literature resulting in a plethora of regression models and performance measures. However, there are few techniques specially dedicated to solve the problem of how to incorporate categorical features to regression problems. Usually, categorical feature encoders are general enough to cover both classification and regression problems. This lack of specificity results in underperforming regression models. In this paper,we provide an in-depth analysis of how to tackle high cardinality categor-ical features with the quantile. Our proposal outperforms state-of-the-encoders, including the traditional statistical mean target encoder, when considering the Mean Absolute Error, especially in the presence of long-tailed or skewed distributions. Besides, to deal with possible overfitting when there are categories with small support, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
