Delayed Commitment for Representation Readiness in Stage-wise Audio-Visual Learning
Xinmeng Xu, Haoran Xie, S. Joe Qin, Lin Li, Xiaohui Tao, Fu Lee Wang

TL;DR
This paper introduces DPC-Net, a framework that improves stage-wise audio-visual learning by estimating and correcting representation readiness, leading to better performance across multiple tasks.
Contribution
It formulates the readiness deficiency problem and proposes DPC-Net to localize and correct bottlenecks in representation propagation for audio-visual tasks.
Findings
DPC-Net improves performance in speech separation, event localization, and speech recognition.
The method effectively localizes intervention-sensitive bottlenecks.
Readiness-guided correction enhances the quality of fused representations.
Abstract
Stage-wise audio-visual encoders propagate fused intermediate states across layers, making the formation of later representations depend on the readiness of earlier fusion states. Strong local audio-visual agreement provides useful correspondence evidence, yet a fused state also needs sufficient cross-layer and cross-modal support before it can reliably guide later fusion. This paper studies this issue through propagation-aware representation readiness and formulates premature perceptual commitment as a readiness-deficiency problem, where local plausibility, propagation influence, and support insufficiency jointly appear at an intermediate stage. We propose the Delayed Perceptual Commitment Network (DPC-Net), an encoder-level framework that estimates an observable readiness-deficiency surrogate, localizes the intervention-sensitive bottleneck, and applies support-aware correction with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
