# An approximate-copula distribution for statistical modeling

**Authors:** Sarah S. Ji, Benjamin B. Chu, Hua Zhou, Kenneth Lange, Michael Sohn, Michael Sohn, Michael Sohn, Michael Sohn, Michael Sohn, Michael Sohn, Michael Sohn

PMC · DOI: 10.1371/journal.pcbi.1013922 · 2026-03-13

## TL;DR

This paper introduces a new probability distribution model for efficiently analyzing complex datasets with mixed types of correlated responses.

## Contribution

A new class of approximate-copula distributions is introduced for efficient statistical modeling of mixed-type correlated data.

## Key findings

- The new distribution allows explicit calculation of moments and distributions needed for maximum likelihood estimation.
- The model is applied to GWAS data with continuous, binary, and count responses, showing flexibility and scalability.
- The approximate-copula model is shown to be computationally efficient in high-dimensional settings.

## Abstract

Copulas, generalized estimating equations, and generalized linear mixed models promote the analysis of grouped data where non-normal responses are correlated. Unfortunately, parameter estimation remains challenging in these three frameworks. Based on prior work of Tonda, we derive a new class of probability density functions that allow explicit calculation of moments, marginal and conditional distributions, and the score and observed information needed in maximum likelihood estimation. We also illustrate how the new distribution flexibly models longitudinal data following a non-Gaussian distribution. Finally, we conduct a tri-variate genome-wide association analysis on dichotomized systolic and diastolic blood pressure and body mass index data from the UK-Biobank, showcasing the modeling potential and computational scalability of the new distributional family.

Modeling correlated responses is computationally challenging beyond the Gaussian realm. For instance, how should repeated binary outcomes in longitudinal studies be modeled? When a dataset contains both continuous and discrete responses, how can their dependence be captured in a principled and efficient way? This paper introduces a new class of probability distributions that enables flexible modeling of correlated responses of mixed type. Inspired by statistical copulas, the proposed approach is designed to remain computationally efficient even in high-dimensional settings. We refer to this framework as an approximate copula model and show that it provides a promising alternative to classical methods such as generalized linear mixed models and generalized estimating equations. To demonstrate its flexibility and scalability, we apply the approximate-copula model to genome-wide association (GWAS) data involving a mixture of continuous, binary, and count responses.

## Full-text entities

- **Genes:** TCF20 (transcription factor 20) [NCBI Gene 6942] {aka AR1, DDVIBA, SPBP, TCF-20}
- **Diseases:** hypertension (MESH:D006973), VC (MESH:C566443)
- **Chemicals:** NAAS (-)
- **Species:** Nicotiana tabacum (American tobacco, species) [taxon 4097], Homo sapiens (human, species) [taxon 9606]
- **Mutations:** rs10871777, rs1421085, rs2681492, rs4500930, rs11191548, rs653178, rs17367504, rs7721099, rs13107325, rs2293579, rs34783010

## Figures

50 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12998956/full.md

---
Source: https://tomesphere.com/paper/PMC12998956