Learning Spectral-Decomposed Tokens for Domain Generalized Semantic   Segmentation

Jingjun Yi; Qi Bi; Hao Zheng; Haolan Zhan; Wei Ji and; Yawen Huang; Yuexiang Li; Yefeng Zheng

arXiv:2407.18568·cs.CV·July 30, 2024

Learning Spectral-Decomposed Tokens for Domain Generalized Semantic Segmentation

Jingjun Yi, Qi Bi, Hao Zheng, Haolan Zhan, Wei Ji and, Yawen Huang, Yuexiang Li, Yefeng Zheng

PDF

1 Repo

TL;DR

This paper introduces a Spectral-decomposed Token (SET) framework that enhances domain generalization in semantic segmentation by decomposing features into style and content components and optimizing style-invariant feature learning.

Contribution

The novel SET framework decomposes frozen VFM features into frequency components and employs an attention method to improve style-invariant feature extraction for better cross-domain segmentation.

Findings

01

Achieves state-of-the-art results on cross-domain semantic segmentation tasks.

02

Effectively separates style and content information in frequency space.

03

Enhances style-invariant feature learning through attention optimization.

Abstract

The rapid development of Vision Foundation Model (VFM) brings inherent out-domain generalization for a variety of down-stream tasks. Among them, domain generalized semantic segmentation (DGSS) holds unique challenges as the cross-domain images share common pixel-wise content information but vary greatly in terms of the style. In this paper, we present a novel Spectral-dEcomposed Token (SET) learning framework to advance the frontier. Delving into further than existing fine-tuning token & frozen backbone paradigm, the proposed SET especially focuses on the way learning style-invariant features from these learnable tokens. Particularly, the frozen VFM features are first decomposed into the phase and amplitude components in the frequency space, which mainly contain the information of content and style, respectively, and then separately processed by learnable tokens for task-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JingjunYi/SET
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training