Explainable Multimodal Regression via Information Decomposition
Zhaozhao Ma, Shujian Yu

TL;DR
This paper introduces a new multimodal regression framework based on Partial Information Decomposition that enhances interpretability by disentangling modality contributions, and demonstrates improved accuracy and modality selection on real-world datasets.
Contribution
It presents a novel PID-based approach with analytical computation and regularization for better interpretability and performance in multimodal regression tasks.
Findings
Outperforms state-of-the-art methods in accuracy.
Provides interpretable decomposition of modality contributions.
Enables informed modality selection for efficient inference.
Abstract
Multimodal regression aims to predict a continuous target from heterogeneous input sources and typically relies on fusion strategies such as early or late fusion. However, existing methods lack principled tools to disentangle and quantify the individual contributions of each modality and their interactions, limiting the interpretability of multimodal fusion. We propose a novel multimodal regression framework grounded in Partial Information Decomposition (PID), which decomposes modality-specific representations into unique, redundant, and synergistic components. The basic PID framework is inherently underdetermined. To resolve this, we introduce inductive bias by enforcing Gaussianity in the joint distribution of latent representations and the transformed response variable (after inverse normal transformation), thereby enabling analytical computation of the PID terms. Additionally, we…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper's goal of creating an intrinsically interpretable multimodal model is appreciated. Framing modality contributions through the principled, information-theoretic lens of PID is a novel and creative approach that could provide a much richer vocabulary for model explanation. 2. The evaluation is comprehensive, spanning six real-world datasets from diverse domains, which demonstrates the framework's versatility. The neuroimaging case study, where the model's interpretations align with e
1. The framework's entire interpretability claim rests on a fragile assumption that the joint latent space can be molded into a Gaussian without destroying the true underlying information dynamics. It seems that the method does not discover the information structure. Instead, it imposes a Gaussian one and then analyzes it. Are the reported PID values a true reflection of the data's properties, or are they merely artifacts of this powerful, and likely mismatched, prior? The lack of guarantees of
1. **Ambitious Idea with End-to-End PID:** Embedding a partial information decomposition directly into a multimodal regression model is novel. The paper tackles the underdetermined nature of PID in continuous domains by introducing Gaussian latent constraints, which is a bold approach. As a result, PIDReg delivers intrinsic interpretability: the learned weights $(w_1,w_2,w_3)$ and computed PID terms give a clear “unique vs. redundant vs. synergistic” attribution of each modality to the predictio
1. **Strong Gaussian Assumption:** A core assumption is that $(Z_1,Z_2,Y)$ is jointly Gaussian so that PID terms admit an analytic solution. However, this assumption is very strong and typically invalid in realistic multimodal tasks. The authors attempt to enforce it via a Shapiro–Wilk test loss, but this only tests for marginal normality. Even if each of $Z_1,Z_2,Y$ is marginally Gaussian, the joint distribution may still be far from multivariate normal (as the paper itself notes). There is no
1. The paper innovatively integrates Partial Information Decomposition (PID) into multimodal regression, solving PID’s underdeterminacy via Gaussianity enforcement on latent distributions (enabling analytical PID computation) and combining CS divergence/CMI regularizers—overcoming prior limitations of PID in high-dimensional continuous data. It also proposes a two-stage optimization for stable fusion weight learning, a creative combination of interpretability and regression. 2. Theoretically ri
1. Lacks validation on data with inherently non-Gaussian latents (e.g., discrete event-based robotics data). No experiments on scenarios where latent non-Gaussianity (e.g., multi-modal signals) might bias PID decomposition, missing supplementary tests (e.g., bimodal MNIST) to quantify impact. 2. Critical params (PID convergence $K=5/\delta^t<0.01$, regularization $\lambda_1=\lambda_2=\lambda_3=0.1$) lack sensitivity analysis. No comparison of performance across param values (e.g., $K=3$ or $\la
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFunctional Brain Connectivity Studies · Explainable Artificial Intelligence (XAI) · Machine Learning in Healthcare
