WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
Yang Liu, Silin Cheng, Xinwei He, Sebastien Ourselin, Lei Tan, Gen Luo

TL;DR
WeakMCN introduces a multi-task collaborative network that jointly improves weakly supervised referring expression comprehension and segmentation through innovative design and cross-task learning, achieving significant performance gains.
Contribution
The paper proposes WeakMCN, a novel multi-task framework with dual-branch architecture, dynamic visual feature enhancement, and collaborative consistency, enabling effective joint learning of REC and RES tasks.
Findings
Achieves up to 3.91% and 13.11% improvements on RefCOCO for WREC and WRES.
Demonstrates strong generalization in semi-supervised settings with +8.94% and +7.71%.
Outperforms state-of-the-art single-task methods on multiple benchmarks.
Abstract
Weakly supervised referring expression comprehension(WREC) and segmentation(WRES) aim to learn object grounding based on a given expression using weak supervision signals like image-text pairs. While these tasks have traditionally been modeled separately, we argue that they can benefit from joint learning in a multi-task framework. To this end, we propose WeakMCN, a novel multi-task collaborative network that effectively combines WREC and WRES with a dual-branch architecture. Specifically, the WREC branch is formulated as anchor-based contrastive learning, which also acts as a teacher to supervise the WRES branch. In WeakMCN, we propose two innovative designs to facilitate multi-task collaboration, namely Dynamic Visual Feature Enhancement(DVFE) and Collaborative Consistency Module(CCM). DVFE dynamically combines various pre-trained visual knowledge to meet different task requirements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Speech and dialogue systems
