TL;DR
The paper introduces DIRCR, a novel visual reasoning model that combines local and global inference paths with contrastive learning to improve rule understanding and generalization on RAVEN datasets.
Contribution
It proposes a dual-inference reasoning framework with rule-contrastive learning, effectively integrating local and global reasoning for visual question answering.
Findings
Significantly improves reasoning robustness on RAVEN datasets.
Enhances feature separability through contrastive learning.
Achieves better generalization compared to existing methods.
Abstract
Abstract visual reasoning remains challenging as existing methods often prioritize either global context or local row-wise relations, failing to integrate both, and lack intermediate feature constraints, leading to incomplete rule capture and entangled representations. To address these issues, we propose the Dual-Inference Rule-Contrastive Reasoning (DIRCR) model. Its core component, the Dual-Inference Reasoning Module, combines a local path for row-wise analogical reasoning and a global path for holistic inference, integrated via a gated attention mechanism. Additionally, a Rule-Contrastive Learning Module introduces pseudo-labels to construct positive and negative rule samples, applying contrastive learning to enhance feature separability and promote abstract, transferable rule learning. Experimental results on three RAVEN datasets demonstrate that DIRCR significantly enhances…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
