Channel Attention-Guided Cross-Modal Knowledge Distillation for Referring Image Segmentation
Chen Yang

TL;DR
This paper introduces a channel attention-guided knowledge distillation approach for referring image segmentation, enabling smaller models to perform well without extra inference parameters by transferring high-order cross-modal correlations.
Contribution
It proposes a novel distillation method that preserves the student model's independence while transferring detailed cross-modal knowledge from the teacher.
Findings
Significant performance improvements on two datasets.
No additional inference parameters required.
Effective transfer of high-order cross-modal correlations.
Abstract
Referring image segmentation (RIS) requires accurate segmentation of target regions in images according to language descriptions, which is a cross-modal task integrating vision and language. Existing RIS methods typically employ large-scale vision and language encoding models to improve performance, but their enormous parameter size severely restricts deployment in scenarios with limited computing resources. To solve this problem, this paper proposes a channel attention-guided cross-modal knowledge distillation method, which transfers the high-order fine-grained correlations between vision and language learned by the teacher network, as well as the correlations between semantic components represented by each channel, to the student network. Compared with the traditional pixel-wise relational distillation, this method not only enables the student to learn the knowledge of the teacher,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
