MoSSDA: A Semi-Supervised Domain Adaptation Framework for Multivariate Time-Series Classification using Momentum Encoder
Seonyoung Kim, Dongil Kim

TL;DR
MoSSDA is a novel semi-supervised domain adaptation framework that leverages momentum encoders and contrastive learning to improve multivariate time-series classification across different domains with limited labeled data.
Contribution
It introduces a two-step momentum encoder-based SSDA framework with a mixup-enhanced contrastive module for robust, domain-invariant feature learning in time-series classification.
Findings
Achieved state-of-the-art results on six datasets.
Effective across multiple backbone architectures.
Robust performance with limited labeled target data.
Abstract
Deep learning has emerged as the most promising approach in various fields; however, when the distributions of training and test data are different (domain shift), the performance of deep learning models can degrade. Semi-supervised domain adaptation (SSDA) is a major approach for addressing this issue, assuming that a fully labeled training set (source domain) is available, but the test set (target domain) provides labels only for a small subset. In this study, we propose a novel two-step momentum encoder-utilized SSDA framework, MoSSDA, for multivariate time-series classification. Time series data are highly sensitive to noise, and sequential dependencies cause domain shifts resulting in critical performance degradation. To obtain a robust, domain-invariant and class-discriminative representation, MoSSDA employs a domain-invariant encoder to learn features from both source and target…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper is well-motivated and structured 2. Comprehensive experiments: The evaluation spans 6 diverse datasets, 3 backbone architectures, and multiple unlabeled ratios (0.7, 0.9, 0.95), demonstrating consistent improvements over strong baselines. Authors also provide a ablation study, which clearly demonstrate the contribution of each component, with the positive contrastive module showing the most significant impact.
Limited novelty. The combination of MMD loss for domain alignment, mixup-enhanced contrastive learning, and momentum encoding is creative and well-justified for time-series data where traditional augmentations can disrupt temporal dependencies. While the combination is novel, the individual components (MMD loss, contrastive learning, momentum encoding) are well-established techniques. The main contribution is their integration for time-series SSDA.
Strengths: 1. The two-stage decoupled training framework proposed in this paper is an effective strategy for handling complex multi-objective optimization problems, avoiding gradient conflicts between feature extraction and classifier training, which makes significant contributions to the optimization stability of existing SSDA methods. 2. The paper achieves a clever application of Mixup in SSDA by applying it in the feature space rather than the input space, enhancing class discriminability wit
Weaknesses: 1. The original theoretical foundation of Mixup lies in encouraging the model to perform linear interpolation between training samples to enhance the model's generalization ability. This paper applies it in the feature space combined with contrastive learning. Is the theoretical effectiveness of this feature space Mixup still equivalent to that of input space Mixup? For the virtual positive samples generated by feature space Mixup in the feature space of time series, can they truly r
1. The components of this method are well-motivated based on the problem the authors present. Each component seems to have a separate but important place that meets the challenges of domain adaptation for time series. 2. The benchmarking results are very impressive compared to baselines, showing substantial improvements in this setting across multiple datasets. 3. This method demonstrates how a combination of methods that have been tested extensively in other research domains, such as mix-up, Mo
1. The proposed method is only applicable in somewhat limited settings for domain adaptation. The motivation for self-supervised DA is clear, but the authors only consider the case where label spaces of both source and target domain are known to be entirely overlapping. This is a major weakness as prevailing methods in domain adaptation consider the unsupervised domain adaptation setting where some labels in the target domain might not be known when training the model. 2. The method uses MMD as
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare
