Estimating Conditional Covariance between labels for Multilabel Data
Laurence A. F. Park, Jesse Read

TL;DR
This paper compares three statistical models to estimate label dependence in multilabel data, revealing their strengths and limitations in measuring constant and dependent covariances, with implications for multilabel analysis.
Contribution
It introduces a comparative analysis of Multivariate Probit, Multivariate Bernoulli, and Staged Logit models for estimating conditional label covariance in multilabel data.
Findings
All models estimate covariance well depending on strength.
Models falsely detect dependent covariance when only constant covariance is present.
Multivariate Probit has the lowest error rate among the models.
Abstract
Multilabel data should be analysed for label dependence before applying multilabel models. Independence between multilabel data labels cannot be measured directly from the label values due to their dependence on the set of covariates , but can be measured by examining the conditional label covariance using a multivariate Probit model. Unfortunately, the multivariate Probit model provides an estimate of its copula covariance, and so might not be reliable in estimating constant covariance and dependent covariance. In this article, we compare three models (Multivariate Probit, Multivariate Bernoulli and Staged Logit) for estimating the constant and dependent multilabel conditional label covariance. We provide an experiment that allows us to observe each model's measurement of conditional covariance. We found that all models measure constant and dependent covariance equally well,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
