Spatio-Semantic Expert Routing Architecture with Mixture-of-Experts for Referring Image Segmentation
Alaa Dalaq, Muzammil Behzad

TL;DR
This paper introduces SERA, a novel architecture with expert routing and expression-aware refinement for referring image segmentation, significantly improving spatial coherence and boundary accuracy over existing methods.
Contribution
SERA employs a two-stage expert refinement approach with a lightweight routing mechanism, enhancing spatial and boundary accuracy while maintaining compatibility with pretrained models.
Findings
Outperforms strong baselines on standard benchmarks.
Achieves notable improvements on spatial localization tasks.
Effective with minimal parameter updates during tuning.
Abstract
Referring image segmentation aims to produce a pixel-level mask for the image region described by a natural-language expression. Although pretrained vision-language models have improved semantic grounding, many existing methods still rely on uniform refinement strategies that do not fully match the diverse reasoning requirements of referring expressions. Because of this mismatch, predictions often contain fragmented regions, inaccurate boundaries, or even the wrong object, especially when pretrained backbones are frozen for computational efficiency. To address these limitations, we propose SERA, a Spatio-Semantic Expert Routing Architecture for referring image segmentation. SERA introduces lightweight, expression-aware expert refinement at two complementary stages within a vision-language framework. First, we design SERA-Adapter, which inserts an expression-conditioned adapter into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Topic Modeling
