Merlin: Multi-View Representation Learning for Robust Multivariate Time Series Forecasting with Unfixed Missing Rates
Chengqing Yu, Fei Wang, Chuanguang Yang, Zezhi Shao, Tao Sun, Tangwen Qian, Wei Wei, Zhulin An, Yongjun Xu

TL;DR
Merlin is a novel multi-view representation learning framework that enhances the robustness of multivariate time series forecasting models against unfixed missing data rates by aligning semantics across incomplete and complete observations.
Contribution
The paper introduces Merlin, a multi-view learning approach with knowledge distillation and contrastive learning, to improve robustness of MTSF models with variable missing rates.
Findings
Merlin significantly improves forecasting accuracy on real-world datasets.
Merlin effectively aligns semantics across different missing data rates.
Experiments demonstrate Merlin's superiority over existing methods.
Abstract
Multivariate Time Series Forecasting (MTSF) involves predicting future values of multiple interrelated time series. Recently, deep learning-based MTSF models have gained significant attention for their promising ability to mine semantics (global and local information) within MTS data. However, these models are pervasively susceptible to missing values caused by malfunctioning data collectors. These missing values not only disrupt the semantics of MTS, but their distribution also changes over time. Nevertheless, existing models lack robustness to such issues, leading to suboptimal forecasting performance. To this end, in this paper, we propose Multi-View Representation Learning (Merlin), which can help existing models achieve semantic alignment between incomplete observations with different missing rates and complete observations in MTS. Specifically, Merlin consists of two key modules:…
Peer Reviews
Decision·Submitted to ICLR 2025
* The paper is organized. * Time series forecasting with missing values is an important problem to study and the proposed method is technically sound. * The experiment results showed the effectiveness of the proposed method.
* The technical novelty is limited as this paper simply combines knowledge distillation and multi-view-based contrastive learning based on STID framework (which is also existing). From the application perspective, the proposed combination is somewhat reasonable. * The motivation for this study is not clear (see questions below). * Certain details about the proposed method are not provided (see below questions).
- **Clear logic of the method** The paper has a clear presentation of the logic behind the proposed method. - **Intuitive method**: The proposed approach is intuitive and easy to follow. - **Clear ablation study**: The ablation study is thorough, clearly demonstrating the effectiveness of each component of the proposed method.
- **Vague description of key notations and definitions**: Two crucial components of the loss function are vaguely defined, making it challenging for others to reimplement the method based solely on the method description. - Knowledge distillation: The teacher model is trained on complete observations, while the student model is trained on missing observations. However, the authors should clarify what constitutes a "complete observation" versus a "missing observation." Given that the focus is
1. The introduction of Merlin, which combines offline knowledge distillation with multi-view contrastive learning, is innovative. This framework efficiently addresses the challenge of unfixed missing rates in multivariate time series data, which is a common real-world issue. 2. The paper thoroughly validates the approach using four real-world datasets, demonstrating Merlin's superiority over existing models and imputation methods. 3. The results show that Merlin consistently outperforms traditi
1. The experiment only includes 12 steps of forecasting results. It can be convincing to include different step settings such as 6 or 24. 2. I have some concerns about the two-stage settings. To ensure fairness, should we keep the teacher and student models with the same architecture? I'm also interested in whether switching backbones and applying Merlin could improve forecasting with missing data. 3. The paper could benefit from a deeper exploration of scenarios where Merlin might underperform
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Forecasting Techniques and Applications · Stock Market Forecasting Methods
