Rank-based approach for estimating correlations in mixed ordinal data
Xiaoyun Quan, James G. Booth, Martin T. Wells

TL;DR
This paper introduces a rank-based semiparametric Gaussian copula model to estimate correlations in high-dimensional mixed data, including ordinal and continuous variables, with proven concentration rates and validated through simulations and real data applications.
Contribution
The paper develops a novel rank-based estimation method for correlation in mixed ordinal and continuous data, extending existing models and providing theoretical guarantees.
Findings
Estimator performs well in simulations
Method effectively captures correlation structure
Validated on real-world datasets
Abstract
High-dimensional mixed data as a combination of both continuous and ordinal variables are widely seen in many research areas such as genomic studies and survey data analysis. Estimating the underlying correlation among mixed data is hence crucial for further inferring dependence structure. We propose a semiparametric latent Gaussian copula model for this problem. We start with estimating the association among ternary-continuous mixed data via a rank-based approach and generalize the methodology to p-level-ordinal and continuous mixed data. Concentration rate of the estimator is also provided and proved. At last, we demonstrate the performance of the proposed estimator by extensive simulations and two case studies of real data examples of algorithmic risk score evaluation and cancer patients survival data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Bayesian Methods and Mixture Models
