Weight-of-evidence 2.0 with shrinkage and spline-binning
Jakob Raymaekers, Wouter Verbeke, Tim Verdonck

TL;DR
This paper introduces an enhanced weight-of-evidence method incorporating spline-based binning and shrinkage estimators to improve classification accuracy and interpretability in high-cardinality and non-linear data scenarios.
Contribution
It proposes a formal, data-driven extension to weight-of-evidence that captures non-linear effects and reduces overfitting, demonstrated through fraud detection experiments.
Findings
Improved classification precision in fraud detection
Effective handling of high-cardinality categorical variables
Reduced overfitting through shrinkage estimators
Abstract
In many practical applications, such as fraud detection, credit risk modeling or medical decision making, classification models for assigning instances to a predefined set of classes are required to be both precise as well as interpretable. Linear modeling methods such as logistic regression are often adopted, since they offer an acceptable balance between precision and interpretability. Linear methods, however, are not well equipped to handle categorical predictors with high-cardinality or to exploit non-linear relations in the data. As a solution, data preprocessing methods such as weight-of-evidence are typically used for transforming the predictors. The binning procedure that underlies the weight-of-evidence approach, however, has been little researched and typically relies on ad-hoc or expert driven procedures. The objective in this paper, therefore, is to propose a formalized,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Financial Distress and Bankruptcy Prediction · Imbalanced Data Classification Techniques
MethodsLogistic Regression
