Towards Motion-aware Referring Image Segmentation

Chaeyun Kim; Seunghoon Yi; Yejin Kim; Yohan Jo; Joonseok Lee

arXiv:2603.17413·cs.CV·March 19, 2026

Towards Motion-aware Referring Image Segmentation

Chaeyun Kim, Seunghoon Yi, Yejin Kim, Yohan Jo, Joonseok Lee

PDF

Open Access

TL;DR

This paper introduces a novel approach for motion-aware referring image segmentation, utilizing data augmentation and a new contrastive learning method to improve understanding of motion-related queries in images.

Contribution

It proposes a motion-centric data augmentation scheme and Multimodal Radial Contrastive Learning (MRaCL), along with a new benchmark M-Bench for evaluating motion-related segmentation.

Findings

01

Significant improvement on motion-centric queries across multiple RIS models

02

Maintains competitive performance on appearance-based descriptions

03

Introduces a new benchmark for motion-focused RIS evaluation

Abstract

Referring Image Segmentation (RIS) requires identifying objects from images based on textual descriptions. We observe that existing methods significantly underperform on motion-related queries compared to appearance-based ones. To address this, we first introduce an efficient data augmentation scheme that extracts motion-centric phrases from original captions, exposing models to more motion expressions without additional annotations. Second, since the same object can be described differently depending on the context, we propose Multimodal Radial Contrastive Learning (MRaCL), performed on fused image-text embeddings rather than unimodal representations. For comprehensive evaluation, we introduce a new test split focusing on motion-centric queries, and introduce a new benchmark called M-Bench, where objects are distinguished primarily by actions. Extensive experiments show our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Generative Adversarial Networks and Image Synthesis