Feature Encodings for Gradient Boosting with Automunge
Nicholas J. Teague

TL;DR
This study benchmarks Automunge's default feature encoding strategies for gradient boosting, demonstrating their effectiveness in training speed and predictive accuracy across diverse datasets, and comparing them to alternative encodings.
Contribution
The paper validates Automunge's default encoding choices for gradient boosting and provides comprehensive benchmarks comparing different encoding strategies.
Findings
Automunge defaults outperform alternatives in training speed and accuracy
Categoric binarization is more suitable than one-hot encoding as a default
Default encodings are effective across diverse datasets
Abstract
Automunge is a tabular preprocessing library that encodes dataframes for supervised learning. When selecting a default feature encoding strategy for gradient boosted learning, one may consider metrics of training duration and achieved predictive performance associated with the feature representations. Automunge offers a default of binarization for categoric features and z-score normalization for numeric. The presented study sought to validate those defaults by way of benchmarking on a series of diverse data sets by encoding variations with tuned gradient boosted learning. We found that on average our chosen defaults were top performers both from a tuning duration and a model performance standpoint. Another key finding was that one hot encoding did not perform in a manner consistent with suitability to serve as a categoric default in comparison to categoric binarization. We present here…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning
MethodsLib
