A nominal association matrix with feature selection for categorical data
Wenxue Huang, Yong Shi, Xiaogang Wang

TL;DR
This paper introduces a probabilistic association matrix for categorical data that measures local-to-global associations, enabling improved feature selection and predictive accuracy assessment.
Contribution
It presents a novel association matrix and vector for categorical variables, along with a flexible global association scheme, advancing feature selection and predictive modeling.
Findings
Effective in financial data analysis
Improves categorical feature selection
Demonstrates strong predictive insights
Abstract
We introduce an informative probabilistic association matrix to measure a proportional local-to-global association of categories of one variable with another categorical variable. Towards a probability based proportional prediction, the association matrix gives rise to the expected predictive distribution of the first and second types of errors for a multinomial response variable. In addition, the normalization of the diagonal of the matrix gives rise to an association vector, which provides the expected category accuracy lift rate distribution. A general scheme of global-to-global association measures with flexible weight vectors is further developed from the matrix. A hierarchy of equivalence relations defined by the association matrix and vector is shown. Applications to financial and survey data together with simulations results are presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpatial and Panel Data Analysis · Bayesian Modeling and Causal Inference · Advanced Statistical Methods and Models
