Linear Discriminant Analysis with High-dimensional Mixed Variables
Binyan Jiang, Chenlei Leng, Cheng Wang, Zhongqing Yang, Xinyang Yu

TL;DR
This paper introduces a novel high-dimensional classification method for datasets with mixed categorical and continuous variables, using kernel smoothing and penalized likelihood to improve estimation and classification accuracy.
Contribution
It develops a new approach for classifying high-dimensional mixed data by integrating kernel smoothing and separate parameter estimation, addressing challenges of high dimensionality and mixed variable types.
Findings
The proposed classifier achieves competitive misclassification rates.
Estimation accuracy is validated through simulations and real data.
The method effectively handles high-dimensional mixed variables.
Abstract
Datasets containing both categorical and continuous variables are frequently encountered in many areas, and with the rapid development of modern measurement technologies, the dimensions of these variables can be very high. Despite the recent progress made in modelling high-dimensional data for continuous variables, there is a scarcity of methods that can deal with a mixed set of variables. To fill this gap, this paper develops a novel approach for classifying high-dimensional observations with mixed variables. Our framework builds on a location model, in which the distributions of the continuous variables conditional on categorical ones are assumed Gaussian. We overcome the challenge of having to split data into exponentially many cells, or combinations of the categorical variables, by kernel smoothing, and provide new perspectives for its bandwidth choice to ensure an analogue of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Face and Expression Recognition
