MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Minhyun Lee; Seungho Lee; Song Park; Dongyoon Han; Byeongho Heo; Hyunjung Shim

arXiv:2411.19067·cs.CV·November 20, 2025

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Minhyun Lee, Seungho Lee, Song Park, Dongyoon Han, Byeongho Heo, Hyunjung Shim

PDF

Open Access 1 Repo

TL;DR

MaskRIS introduces a novel data augmentation framework for Referring Image Segmentation that employs image and text masking combined with distortion-aware learning, significantly improving robustness and achieving state-of-the-art results.

Contribution

The paper proposes MaskRIS, a new training framework utilizing masking strategies and distortion-aware learning to enhance RIS performance beyond existing methods.

Findings

01

Outperforms existing RIS methods on multiple datasets

02

Improves robustness to occlusions and linguistic complexities

03

Effective in both supervised and weakly supervised settings

Abstract

Referring Image Segmentation (RIS) is an advanced vision-language task that involves identifying and segmenting objects within an image as described by free-form text descriptions. While previous studies focused on aligning visual and language features, exploring training techniques, such as data augmentation, remains underexplored. In this work, we explore effective data augmentation for RIS and propose a novel training framework called Masked Referring Image Segmentation (MaskRIS). We observe that the conventional image augmentations fall short of RIS, leading to performance degradation, while simple random masking significantly enhances the performance of RIS. MaskRIS uses both image and text masking, followed by Distortion-aware Contextual Learning (DCL) to fully exploit the benefits of the masking strategy. This approach can improve the model's robustness to occlusions, incomplete…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver-ai/maskris
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in cancer detection · Multimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques

MethodsLinear Layer · Weight Decay · Attention Dropout · Linear Warmup With Linear Decay · WordPiece · Refunds@Expedia|||How do I get a full refund from Expedia? · BERT · Vision Transformer · Stochastic Depth · Swin Transformer