# Learning Continuous Decomposable Models Using Mutual Information and Statistical Copulas

**Authors:** Luiz Desuó Neto, Henrique de Oliveira Caetano, Matheus de Souza Sant’Anna Fogliatto, Carlos Dias Maciel

PMC · DOI: 10.3390/e28030293 · 2026-03-04

## TL;DR

This paper introduces a new method for learning dependence structures in continuous data using mutual information and copulas, improving accuracy and interpretability.

## Contribution

A novel mutual information identity and nonparametric estimation pipeline for decomposable graphical models.

## Key findings

- The proposed method improves edge recovery accuracy on synthetic chordal benchmarks.
- It produces interpretable dependence summaries on a real gene expression dataset.
- A practical nonparametric copula entropy estimation pipeline is developed.

## Abstract

Learning dependence graphs from multivariate continuous data is challenging when marginal distributions are heterogeneous, since likelihood-based nonparametric scores can be sensitive to smoothing choices and can confound marginal irregularities, including non-identifiability, with dependence. This work studies structure learning in the class of decomposable (chordal) Markov random fields, where junction tree factorizations enable tractable inference and local score updates. Our first contribution is a theoretical result showing that, under decomposability, mutual information can be expressed as a difference of clique/separator copula entropies, yielding a dependence-only decomposition aligned with the clique/separator structure. Building on this identity, we define an information-theoretic objective for decomposable graphs with a complexity penalty that preserves clique/separator additivity, and we derive closed-form local score differences for chordality-preserving single-edge insertions and deletions. To make the score computable from data, we instantiate clique/separator copula entropies using pseudo-observations and a probit-transformed kernel density estimator with predictive log score evaluation to mitigate boundary effects on the unit hypercube. The resulting nonparametric greedy procedure improves edge recovery accuracy on synthetic chordal benchmarks compared with a likelihood-driven nonparametric baseline, and it produces interpretable dependence summaries on an airway epithelial gene expression dataset. Concretely, this paper contributes (1) a decomposable mutual information identity via clique/separator copula entropies, (2) a copula information score with an additive complexity penalty for decomposable graphs, (3) a closed-form local score, enabling greedy chordal add or delete search, (4) a practical nonparametric copula entropy estimation pipeline, and (5) empirical gains on synthetic and real data.

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC13024919/full.md

---
Source: https://tomesphere.com/paper/PMC13024919