CORA: Consistency-Guided Semi-Supervised Framework for Reasoning Segmentation
Prantik Howlader, Hoang Nguyen-Canh, Srijan Das, Jingyi Xu, Hieu Le, Dimitris Samaras

TL;DR
CORA is a semi-supervised framework that enhances reasoning segmentation by leveraging limited labeled data and unlabeled images through consistency-guided pseudo-label filtering and contrastive alignment, achieving state-of-the-art results with minimal supervision.
Contribution
The paper introduces CORA, a novel semi-supervised reasoning segmentation method that combines visual instructions, pseudo-label filtering, and contrastive alignment to improve performance under limited annotations.
Findings
Outperforms baselines with as few as 100 labeled images on Cityscapes.
Achieves +2.4% improvement with only 180 labeled images on PanNuke.
Demonstrates robustness under constrained annotation settings.
Abstract
Reasoning segmentation seeks pixel-accurate masks for targets referenced by complex, often implicit instructions, requiring context-dependent reasoning over the scene. Recent multimodal language models have advanced instruction following segmentation, yet generalization remains limited. The key bottleneck is the high cost of curating diverse, high-quality pixel annotations paired with rich linguistic supervision leading to brittle performance under distribution shift. Therefore, we present CORA, a semi-supervised reasoning segmentation framework that jointly learns from limited labeled data and a large corpus of unlabeled images. CORA introduces three main components: 1) conditional visual instructions that encode spatial and contextual relationships between objects; 2) a noisy pseudo-label filter based on the consistency of Multimodal LLM's outputs across semantically equivalent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
