TL;DR
This paper introduces a novel framework called Cognitive Dual-Pathway Reasoning (CDPR) for multimodal intent recognition, effectively handling cross-modal conflicts and improving semantic understanding by combining intuition and reasoning pathways.
Contribution
The paper proposes a dual-pathway reasoning framework with a novel inconsistency perception mechanism and multi-view loss, advancing multimodal intent recognition by better modeling cross-modal interactions.
Findings
CDPR achieves state-of-the-art performance on two benchmarks.
The framework demonstrates superior robustness in handling multimodal inconsistencies.
Representation disentanglement improves modality-invariant feature extraction.
Abstract
Multimodal Intent Recognition (MIR) aims to understand complex user intentions by leveraging text, video, and audio signals. However, existing approaches face two key challenges: (1) overlooking intricate cross-modal interactions for distinguishing consistent and inconsistent cues, and (2) ineffectively modeling multimodal conflicts, leading to semantic cancellation. To address these, we propose a novel Cognitive Dual-Pathway Reasoning (CDPR) framework, which constructs a stable semantic foundation via the intuition pathway and mitigates high-level semantic conflicts through the reasoning pathway, cooperatively establishing deep semantic relations. Specifically, we first employ a representation disentanglement strategy to extract modality-invariant and specific features. Subsequently, the intuition pathway aggregates cross-modal consensus using shared features for solid global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
