Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model

Alaa Dalaq; Muzammil Behzad

arXiv:2505.19242·cs.CV·May 27, 2025

Deformable Attentive Visual Enhancement for Referring Segmentation Using Vision-Language Model

Alaa Dalaq, Muzammil Behzad

PDF

Open Access

TL;DR

SegVLM is a novel vision-language model that enhances referring image segmentation by integrating deformable convolutions, SE blocks, residual connections, and a new RAF loss, leading to improved accuracy and generalization.

Contribution

The paper introduces SegVLM, a new model with architectural innovations and a referring-aware fusion loss for better cross-modal alignment and segmentation performance.

Findings

01

Each component improves segmentation accuracy.

02

Model generalizes well across datasets.

03

Achieves state-of-the-art results in referring segmentation.

Abstract

Image segmentation is a fundamental task in computer vision, aimed at partitioning an image into semantically meaningful regions. Referring image segmentation extends this task by using natural language expressions to localize specific objects, requiring effective integration of visual and linguistic information. In this work, we propose SegVLM, a vision-language model that incorporates architectural improvements to enhance segmentation accuracy and cross-modal alignment. The model integrates squeeze-and-excitation (SE) blocks for dynamic feature recalibration, deformable convolutions for geometric adaptability, and residual connections for deep feature learning. We also introduce a novel referring-aware fusion (RAF) loss that balances region-level alignment, boundary precision, and class imbalance. Extensive experiments and ablation studies demonstrate that each component contributes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications