DCER: Dual-Stage Compression and Energy-Based Reconstruction

Yiwen Wang; Jiahao Qin

arXiv:2602.04904·cs.LG·February 6, 2026

DCER: Dual-Stage Compression and Energy-Based Reconstruction

Yiwen Wang, Jiahao Qin

PDF

Open Access

TL;DR

DCER introduces a dual-stage compression and energy-based reconstruction framework that enhances robustness in multimodal fusion, effectively handling noisy inputs and missing modalities, achieving state-of-the-art results.

Contribution

The paper presents a novel unified framework combining frequency-based compression and energy-based reconstruction to improve multimodal robustness against noise and missing data.

Findings

01

State-of-the-art performance on CMU-MOSI, CMU-MOSEI, and CH-SIMS datasets.

02

Energy-based uncertainty correlates strongly with prediction errors.

03

Robustness pattern favors multimodal fusion even with high missing modalities.

Abstract

Multimodal fusion faces two robustness challenges: noisy inputs degrade representation quality, and missing modalities cause prediction failures. We propose DCER, a unified framework addressing both challenges through dual-stage compression and energy-based reconstruction. The compression stage operates at two levels: within-modality frequency transforms (wavelet for audio, DCT for video) remove noise while preserving task-relevant patterns, and cross-modality bottleneck tokens force genuine integration rather than modality-specific shortcuts. For missing modalities, energy-based reconstruction recovers representations via gradient descent on a learned energy function, with the final energy providing intrinsic uncertainty quantification (\r{ho} > 0.72 correlation with prediction error). Experiments on CMU-MOSI, CMU-MOSEI, and CH-SIMS demonstrate state-of-the-art performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis