SWAT: Sliding Window Adversarial Training for Gradual Domain Adaptation
Zixi Wang, Xiangxu Zhao, Tonglan Xie, Mengmeng Jing, Lin Zuo

TL;DR
SWAT introduces a novel sliding window adversarial training approach for gradual domain adaptation, effectively reducing domain shifts through intermediate domains and outperforming previous methods on multiple benchmarks.
Contribution
The paper proposes SWAT, a new method that uses a sliding window over adversarial streams to improve gradual domain adaptation performance.
Findings
Achieves 6.1% improvement on Rotated MNIST
Attains 4.1% advantage on CIFAR-100C
Demonstrates significant effectiveness across six benchmarks
Abstract
Domain shifts are critical issues that harm the performance of machine learning. Unsupervised Domain Adaptation (UDA) mitigates this issue but suffers when the domain shifts are steep and drastic. Gradual Domain Adaptation (GDA) alleviates this problem in a mild way by gradually adapting from the source to the target domain using multiple intermediate domains. In this paper, we propose Sliding Window Adversarial Training (SWAT) for GDA. SWAT first formulates adversarial streams to connect the feature spaces of the source and target domains. Then, a sliding window paradigm is designed that moves along the adversarial stream to gradually narrow the small gap between adjacent intermediate domains. When the window moves to the end of the stream, i.e., the target domain, the domain shift is explicitly reduced. Extensive experiments on six GDA benchmarks demonstrate the significant…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
* This paper proposes an effective method on GDA, and the ablation studies validate the effectiveness of each part of the method.
* This paper has limited innovations to the prior work. There is prior work [1] with a very similar sliding window idea to this paper. In lines 48-49, you claim that the sliding window mechanism is a new training paradigm in GDA. But the work of [1] also uses a sliding window and a similar parameter $\rho$ to shift from source to target. The authors didn’t compare the difference and the innovation of this work. * Many statements in the paper are not clear. Lines 42-46 make no sense to me. What d
- First of all, the paper proposes Gradual Domain Adaptation by introducing a continuous domain flow formulation, bridging the gap between static UDA and dynamic adaptation through a hypothesis based on smooth, feature-space probability transition. Technically, the sliding window paradigm enables localized and incremental adaptation between neighboring domains, ensuring stable alignment and preventing abrupt feature shifts. - The overall framework is well designed and implemented according to th
- One concern is that the proposed method assumes monotonic and smooth domain sequences, which may limit its applicability to non-linear or multi-factor domain shifts. If the relationships among domains are non-linear or influenced by multiple independent factors, further validation is needed to determine whether the proposed assumptions remain valid and whether the method can still operate effectively under such complex domain transition scenarios. - Although the paper shows few grammatical err
- Decomposes a hard domain shift into many tiny “micro-transfers,” which reduces instability and negative transfer. - The sliding window only matches nearby domains, not global distributions, which makes adversarial training smoother and easier to optimize. - The bidirectional adversarial training with consistency regularity is reasonable. - Experimental results on the included datasets seem to be good.
- Yet the sliding window is a clever idea, the utilized techniques are common in the DA community. Specifically, the adversarial training with WGAN loss and cycle-consistency loss is widely investigated in DA and cycle-GAN. Thus, I feel that there are few challenges in GDA with the sliding window, and the contribution is limited. - SWAT trains generators, discriminators, and classifiers jointly over many sliding windows with a curriculum over `p`. This is heavier than standard self-training or
1. Comprehensive experimental validation on six benchmarks, with consistent performance gains and strong ablations. 2. Clear conceptual motivation — the local-to-global adaptation perspective is easy to understand and practically effective. 3. Good empirical robustness under different corruption levels (CIFAR-10C/100C), suggesting the method’s general applicability. 4. Well-written and reproducible — detailed experimental setup, clear algorithmic description, and open-sourcing commitment.
1. The sliding-window idea overlaps strongly with existing notions of progressive, incremental, or curriculum-based domain alignment. The adversarial training procedure is a direct adaptation of standard WGAN. 2. No analysis of stability, convergence, or relation to domain discrepancy bounds (e.g., HΔH or Wasserstein continuity). 3. The paper reads more like an improved implementation of GDA rather than an algorithmic innovation for representation learning. 4. Nearly all benchmarks are synthe
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning
