DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

Ming Dai; Wenxuan Cheng; Jiang-jiang Liu; Sen Yang; Wenxiao Cai; Yanpeng Sun; Wankou Yang

arXiv:2507.01738·cs.CV·October 14, 2025

DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy

Ming Dai, Wenxuan Cheng, Jiang-jiang Liu, Sen Yang, Wenxiao Cai, Yanpeng Sun, Wankou Yang

PDF

Open Access 1 Repo 1 Models

TL;DR

DeRIS introduces a modular framework for referring image segmentation that decouples perception and cognition, uses loopback synergy to improve multi-modal understanding, and addresses data imbalance, achieving better performance and adaptability.

Contribution

The paper proposes DeRIS, a novel modular approach that systematically analyzes and enhances perception and cognition in RIS through loopback synergy and data augmentation.

Findings

01

Loopback synergy improves segmentation accuracy.

02

DeRIS effectively handles both non- and multi-referent scenarios.

03

Data augmentation addresses long-tail distribution issues.

Abstract

Referring Image Segmentation (RIS) is a challenging task that aims to segment objects in an image based on natural language expressions. While prior studies have predominantly concentrated on improving vision-language interactions and achieving fine-grained localization, a systematic analysis of the fundamental bottlenecks in existing RIS frameworks remains underexplored. To bridge this gap, we propose DeRIS, a novel framework that decomposes RIS into two key components: perception and cognition. This modular decomposition facilitates a systematic analysis of the primary bottlenecks impeding RIS performance. Our findings reveal that the predominant limitation lies not in perceptual deficiencies, but in the insufficient multi-modal cognitive capacity of current models. To mitigate this, we propose a Loopback Synergy mechanism, which enhances the synergy between the perception and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Dmmm1997/DeRIS
pytorchOfficial

Models

🤗
chengwenxuan7/DeRIS
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Visual Attention and Saliency Detection