A Random-effects Approach to Regression Involving Many Categorical Predictors and Their Interactions
Hanmei Sun, Jiangshan Zhang, Jiming Jiang

TL;DR
This paper introduces a mixed model approach for regression with many categorical predictors and interactions, offering a statistically valid alternative to traditional shrinkage methods, with demonstrated empirical advantages and uncertainty measures.
Contribution
It presents a novel mixed model prediction method for high-dimensional categorical data, establishing its theoretical validity and empirical benefits over existing shrinkage techniques.
Findings
The proposed method outperforms shrinkage methods in empirical tests.
Theoretical validation confirms the approach's statistical soundness.
Developed uncertainty measures for the mixed model predictions.
Abstract
Linear model prediction with a large number of potential predictors is both statistically and computationally challenging. The traditional approaches are largely based on shrinkage selection/estimation methods, which are applicable even when the number of potential predictors is (much) larger than the sample size. A situation of the latter scenario occurs when the candidate predictors involve many binary indicators corresponding to categories of some categorical predictors as well as their interactions. We propose an alternative approach to the shrinkage prediction methods in such a case based on mixed model prediction, which effectively treats combinations of the categorical effects as random effects. We establish theoretical validity of the proposed method, and demonstrate empirically its advantage over the shrinkage methods. We also develop measures of uncertainty for the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Statistical Methods and Inference · Advanced Statistical Methods and Models
