Statistical Hypothesis Testing for Information Value (IV)
Helder Rojas, Cirilo Alvarez, Nilton Rojas

TL;DR
This paper introduces a statistically rigorous hypothesis testing framework for Information Value (IV), improving feature selection reliability especially in imbalanced datasets, by connecting IV with Jeffreys divergence and proposing a novel nonparametric test.
Contribution
It establishes a formal statistical foundation for IV, linking it to Jeffreys divergence, and develops a new nonparametric hypothesis test with asymptotic guarantees for feature selection.
Findings
The J-Divergence test outperforms traditional IV thresholds in imbalanced scenarios.
The proposed method is model-agnostic and computationally efficient.
An open-source Python library facilitates practical adoption.
Abstract
Information Value (IV) is a widely used technique for feature selection prior to the modeling phase, particularly in credit scoring and related domains. However, conventional IV-based practices rely on fixed empirical thresholds, which lack statistical justification and may be sensitive to characteristics such as class imbalance. In this work, we develop a formal statistical framework for IV by establishing its connection with Jeffreys divergence and propose a novel nonparametric hypothesis test, referred to as the J-Divergence test. Our method provides rigorous asymptotic guarantees and enables interpretable decisions based on \(p\)-values. Numerical experiments, including synthetic and real-world data, demonstrate that the proposed test is more reliable than traditional IV thresholding, particularly under strong imbalance. The test is model-agnostic, computationally efficient, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Statistical Methods and Models · Neural Networks and Applications · Imbalanced Data Classification Techniques
MethodsFeature Selection
