Robust Core-Periphery Constrained Transformer for Domain Adaptation
Xiaowei Yu, Zeyu Zhang, Dajiang Zhu, Tianming Liu

TL;DR
This paper introduces a brain-inspired core-periphery constrained Transformer that enhances unsupervised domain adaptation by improving transferability and robustness across diverse datasets.
Contribution
It proposes a novel RCCT model that incorporates a core-periphery structure into self-attention, inspired by human brain organization, to improve domain adaptation performance.
Findings
Achieves state-of-the-art results on multiple UDA benchmarks.
Demonstrates robustness to noisy data through latent space perturbations.
Significantly outperforms existing methods on key datasets.
Abstract
Unsupervised domain adaptation (UDA) aims to learn transferable representation across domains. Recently a few UDA works have successfully applied Transformer-based methods and achieved state-of-the-art (SOTA) results. However, it remains challenging when there exists a large domain gap between the source and target domain. Inspired by humans' exceptional transferability abilities to adapt knowledge from familiar to uncharted domains, we try to apply the universally existing organizational structure in the human functional brain networks, i.e., the core-periphery principle to design the Transformer and improve its UDA performance. In this paper, we propose a novel brain-inspired robust core-periphery constrained transformer (RCCT) for unsupervised domain adaptation, which brings a large margin of performance improvement on various datasets. Specifically, in RCCT, the self-attention…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
* The proposed methods seem fairly well motivated, interesting, and novel. The idea of suppressing domain-specific information flow and encouraging domain-invariant information flow is intuitive. * The proposed RCCT achieves consistently good improvements across all benchmarks. * The paper is well-written and easy to follow.
* Important ablation studies are missing * There are no ablation studies on the different loss terms involved. Hence, it is unclear which losses contribute more or less to the performance. For example, is the global discriminator required if the patch discriminator is already performing a similar task? * In Table 5, CP and LFI are added to a baseline that already uses SCM (and SCM gives the most improvement compared to CP and LFI). But we also need to check how well CP and LFI can work o
While unsupervised domain adaptation (UDA) is quite a well established problem, the majority of the algorithms has been built on CNNs. With the ever increasing popularity of transformers in all facets of computer vision, it is important to study how these backbones work with UDA, especially with recent studies discovering the robustness of ViTs to OOD samples. Thus, the problem this paper is solving is quite an important and relevant one. The neuroscience inspired methodology is also new in this
1. The presentation of the paper is very poor, which made it hard to understand the core contributions - For a paper which involves modifications of an architecture (ViT), it is crucial to introduce the components of the basic architecture to contextualize the modifications and be self-contained. This paper does not do this. - There is no proper organization of the methodology section. Sec 3.4 uses a graph constructed in Sec 3.3.1, however, the relevance of the graph is never mentioned in the la
1. The empirical performance of RCCT is strong.
1. The novelty of this paper may be insufficient. The "core-periphery" attention is simply a slightly modified standard multi-head self-attention. "Latent feature interaction" seems to be a simple feature-space mix-up. None of them are novel. Their connection to the biological brain is simply far-fetched. 2. All the citation formats in the paper are incorrect. Many equations include errors, e.g., Eqs. (3-4). 3. As shown in Eq. (15), the proposed method introduces too many additional hyper-para
1. This paper is well-written and easy to follow. 2. The experiment results is strong, proving the effectiveness of the proposed method.
1. Why the improvement appears to be smaller on DomainNet than other datasets? On DomainNet, RCCT: 46, CDTrans: 45.2, while on Visda, RCCT: 90.7, CDTrans: 88.4. I know DomainNet is more difficult to handle, but we still can have some further discussions. 2. I am also curious about the performance on other domain adaptation settings, such as Partial Domain Adaptation. Can we have a discussion?
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dense Connections · Layer Normalization · Dropout · Byte Pair Encoding · Adam · Position-Wise Feed-Forward Layer
