Explainable Multimodal Regression via Information Decomposition

Zhaozhao Ma; Shujian Yu

arXiv:2512.22102·cs.LG·December 29, 2025

Explainable Multimodal Regression via Information Decomposition

Zhaozhao Ma, Shujian Yu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a new multimodal regression framework based on Partial Information Decomposition that enhances interpretability by disentangling modality contributions, and demonstrates improved accuracy and modality selection on real-world datasets.

Contribution

It presents a novel PID-based approach with analytical computation and regularization for better interpretability and performance in multimodal regression tasks.

Findings

01

Outperforms state-of-the-art methods in accuracy.

02

Provides interpretable decomposition of modality contributions.

03

Enables informed modality selection for efficient inference.

Abstract

Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods lack principled tools to disentangle and quantify the individual contributions of each modality and their interactions, limiting the interpretability of multimodal fusion. We propose a novel multimodal regression framework grounded in Partial Information Decomposition (PID), which decomposes modality-specific representations into unique, redundant, and synergistic components. The basic PID framework is inherently underdetermined. To resolve this, we introduce inductive bias by enforcing Gaussianity in the joint distribution of latent representations and the transformed response variable (after inverse normal transformation), thereby enabling analytical computation of the PID terms. Additionally, we…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper's goal of creating an intrinsically interpretable multimodal model is appreciated. Framing modality contributions through the principled, information-theoretic lens of PID is a novel and creative approach that could provide a much richer vocabulary for model explanation. 2. The evaluation is comprehensive, spanning six real-world datasets from diverse domains, which demonstrates the framework's versatility. The neuroimaging case study, where the model's interpretations align with e

Weaknesses

1. The framework's entire interpretability claim rests on a fragile assumption that the joint latent space can be molded into a Gaussian without destroying the true underlying information dynamics. It seems that the method does not discover the information structure. Instead, it imposes a Gaussian one and then analyzes it. Are the reported PID values a true reflection of the data's properties, or are they merely artifacts of this powerful, and likely mismatched, prior? The lack of guarantees of

Reviewer 02Rating 4Confidence 4

Strengths

1. **Ambitious Idea with End-to-End PID:** Embedding a partial information decomposition directly into a multimodal regression model is novel. The paper tackles the underdetermined nature of PID in continuous domains by introducing Gaussian latent constraints, which is a bold approach. As a result, PIDReg delivers intrinsic interpretability: the learned weights $(w_1,w_2,w_3)$ and computed PID terms give a clear “unique vs. redundant vs. synergistic” attribution of each modality to the predictio

Weaknesses

1. **Strong Gaussian Assumption:** A core assumption is that $(Z_1,Z_2,Y)$ is jointly Gaussian so that PID terms admit an analytic solution. However, this assumption is very strong and typically invalid in realistic multimodal tasks. The authors attempt to enforce it via a Shapiro–Wilk test loss, but this only tests for marginal normality. Even if each of $Z_1,Z_2,Y$ is marginally Gaussian, the joint distribution may still be far from multivariate normal (as the paper itself notes). There is no

Reviewer 03Rating 2Confidence 3

Strengths

1. The paper innovatively integrates Partial Information Decomposition (PID) into multimodal regression, solving PID’s underdeterminacy via Gaussianity enforcement on latent distributions (enabling analytical PID computation) and combining CS divergence/CMI regularizers—overcoming prior limitations of PID in high-dimensional continuous data. It also proposes a two-stage optimization for stable fusion weight learning, a creative combination of interpretability and regression. 2. Theoretically ri

Weaknesses

1. Lacks validation on data with inherently non-Gaussian latents (e.g., discrete event-based robotics data). No experiments on scenarios where latent non-Gaussianity (e.g., multi-modal signals) might bias PID decomposition, missing supplementary tests (e.g., bimodal MNIST) to quantify impact. 2. Critical params (PID convergence $K=5/\delta^t<0.01$, regularization $\lambda_1=\lambda_2=\lambda_3=0.1$) lack sensitivity analysis. No comparison of performance across param values (e.g., $K=3$ or $\la

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFunctional Brain Connectivity Studies · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare