Channel Attention-Guided Cross-Modal Knowledge Distillation for Referring Image Segmentation

Chen Yang

arXiv:2604.16806·cs.CV·April 21, 2026

Channel Attention-Guided Cross-Modal Knowledge Distillation for Referring Image Segmentation

Chen Yang

PDF

TL;DR

This paper introduces a channel attention-guided knowledge distillation approach for referring image segmentation, enabling smaller models to perform well without extra inference parameters by transferring high-order cross-modal correlations.

Contribution

It proposes a novel distillation method that preserves the student model's independence while transferring detailed cross-modal knowledge from the teacher.

Findings

01

Significant performance improvements on two datasets.

02

No additional inference parameters required.

03

Effective transfer of high-order cross-modal correlations.

Abstract

Referring image segmentation (RIS) requires accurate segmentation of target regions in images according to language descriptions, which is a cross-modal task integrating vision and language. Existing RIS methods typically employ large-scale vision and language encoding models to improve performance, but their enormous parameter size severely restricts deployment in scenarios with limited computing resources. To solve this problem, this paper proposes a channel attention-guided cross-modal knowledge distillation method, which transfers the high-order fine-grained correlations between vision and language learned by the teacher network, as well as the correlations between semantic components represented by each channel, to the student network. Compared with the traditional pixel-wise relational distillation, this method not only enables the student to learn the knowledge of the teacher,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.