FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain
Zhengnan Li, Haoxuan Li, Hao Wang, Jun Fang, Yuting Tan, Xilong Cheng Yunxiao Qin

TL;DR
This paper introduces FSMLP, a frequency domain time series forecasting framework utilizing Simplex-MLP layers to reduce overfitting and improve accuracy by constraining weights within a simplex, validated on multiple datasets.
Contribution
The paper proposes a novel Simplex-MLP layer constrained within a simplex to mitigate overfitting in channel-wise MLPs, and develops the FSMLP framework for improved time series forecasting.
Findings
FSMLP achieves significant accuracy improvements on benchmark datasets.
Simplex-MLP reduces overfitting compared to standard MLPs.
Theoretical analysis shows lower Rademacher complexity for Simplex-MLP.
Abstract
Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The idea of introducing a simplex constraint on MLP weights is conceptually interesting. 2. The experimental section covers multiple benchmark datasets, providing a broad empirical context for evaluation.
1. In Table 1, it is unclear what type of “extreme values” the authors are referring to. The meaning of this term is ambiguous and requires clarification. 2. The content in Section 2.1 Time Series Forecasting, is not closely connected to the main topic discussed in this paper. 3. Regarding the description of the related work FreTS, the statements in lines 140–143 are incorrect. Furthermore, even on the authors’ own terms, the position expressed in lines 140–143 appears inconsistent with that i
- The paper gives a solid theoretical and empirical motivation for the overfitting problem associated with traditional channel-wise MLPs due to extreme values, as summarized in Table 1 and visually reinforced in Figure 1, which shows overfitting trend disparities among methods (FSMLP, TimesNet, TSMixer, Autoformer). - The simplex constraint is rigorously justified with Rademacher complexity bounds (Section 5, Theorem 2), and a detailed proof is given in the Appendix, explaining why the constrain
1. The math formulations and step-by-step derivations for projecting weights onto the simplex lack clarity around certain details, such as the computational complexity of each transformation and how they are implemented for large-scale matrices. The choices for $$f_\mathrm{trans}$$ (absolute, log, square) are described, but a more precise algorithmic statement or pseudocode for the entire weight update procedure, particularly for batch settings, is missing. This may hinder reproducibility and un
1.A concise and general mechanism.The paper links outliers to weight-norm inflation and mitigates overfitting to data spikes by constraining channel-mixing weights within the simplex. This approach is easy to implement, interpretable as a convex combination across channels, and theoretically justified by showing that the simplex constraint yields a tighter Rademacher bound. 2.Broad and scalable empirical evidence.FSMLP remains competitive or achieves state-of-the-art results on long-horizon for
1.Weak geometric motivation and interpretation.Although the paper describes the weight constraint as a “geometric constraint” that restricts parameters to lie within a standard simplex, its discussion of geometry remains largely formal, focusing only on the non-negativity and sum-to-one convex-combination conditions. The authors do not further explore the meaning of this geometric structure for the overall method, nor do they provide visualization or geometric interpretation. For instance, in cl
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStructural Health Monitoring Techniques
