CPiRi: Channel Permutation-Invariant Relational Interaction for Multivariate Time Series Forecasting
Jiyuan Xu, Wenyu Zhang, Xin Jing, Shuai Chen, Shuai Zhang, Jiahao Nie

TL;DR
CPiRi introduces a permutation-invariant framework for multivariate time series forecasting that effectively models inter-channel dependencies without overfitting to channel order, enabling robust and adaptable predictions.
Contribution
It proposes a novel permutation-invariant approach combining a decoupled architecture and regularization, with theoretical grounding and superior empirical performance.
Findings
State-of-the-art results on multiple benchmarks.
Stable performance under channel shuffling.
Strong generalization to unseen channels.
Abstract
Current methods for multivariate time series forecasting can be classified into channel-dependent and channel-independent models. Channel-dependent models learn cross-channel features but often overfit the channel ordering, which hampers adaptation when channels are added or reordered. Channel-independent models treat each channel in isolation to increase flexibility, yet this neglects inter-channel dependencies and limits performance. To address these limitations, we propose \textbf{CPiRi}, a \textbf{channel permutation invariant (CPI)} framework that infers cross-channel structure from data rather than memorizing a fixed ordering, enabling deployment in settings with structural and distributional co-drift without retraining. CPiRi couples \textbf{spatio-temporal decoupling architecture} with \textbf{permutation-invariant regularization training strategy}: a frozen pretrained temporal…
Peer Reviews
Decision·ICLR 2026 Poster
1. The problem framing is timely and interesting. The paper not only says “permutations matter” but builds an explicit diagnostic: train with fixed order, test with shuffled order, show catastrophic failure for several competitive models. 2. The proposed framework is reasonable and coherent, with the per channel frozen temporal encoder feeding a permutation aware spatial block, and the permutation based training strategy reinforcing the intended behavior. 3. The reported improvements indicate t
1. My main concern is the reliance on a large pre-trained backbone. Table 4 shows that removing the pre-trained weights leads to a substantial drop in accuracy, which suggests that much of the gain comes from the foundation model rather than from the proposed permutation invariant interaction itself. However, in Table 1 the competing CD baselines are trained from scratch and do not benefit from comparable pre-training. This raises a fairness question: to what extent are the improvements due to t
1. Interesting motivation. 2. Clear and easy-to-follow writing. 3. Comprehensive theoretical analysis provides support for the proposed method.
1. Since this paper only uses Sundial as the temporal feature extractor, it lacks an explanation of why Sundial was chosen over other foundation models. Can this framework generalize to other pretrained models such as Chronos [1] or Moment [2]? 2. I appreciate that the paper uses high-dimensional datasets with channel heterogeneity. This is an interesting attempt for scalability analysis. However, could you also test the model on Time-HD [3]? 3. The main concern lies in efficiency (which the aut
1. Clear Problem Identification: The paper clearly diagnoses a critical flaw in existing CD models using a simple "channel shuffling" diagnostic test. This test reveals that many SOTA models rely on "positional memorization," leading to catastrophic performance collapse (e.g., >400% error increase for Informer) when channel order is changed. 2. Effective and Sound Design: The "spatio-temporal decoupling" is an elegant solution. It leverages the power of a robust, pre-trained temporal model (CI s
1. The framework's success is entirely dependent on the frozen Sundial encoder. The ablation study "w/o pretrained weights" results in "complete failure". This makes it difficult to separate the contribution of the novel CPiRi training strategy from the powerful priors of the (very large) foundation model it relies on. 2. The individual components are standard: a "standard Transformer encoder block" for the spatial module and a data augmentation technique for training. The novelty is in the clev
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Traffic Prediction and Management Techniques
