FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Zhengnan Li; Haoxuan Li; Hao Wang; Jun Fang; Yuting Tan; Xilong Cheng Yunxiao Qin

arXiv:2412.01654·cs.LG·March 5, 2026

FSMLP: Modelling Channel Dependencies With Simplex Theory Based Multi-Layer Perceptions In Frequency Domain

Zhengnan Li, Haoxuan Li, Hao Wang, Jun Fang, Yuting Tan, Xilong Cheng Yunxiao Qin

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces FSMLP, a frequency domain time series forecasting framework utilizing Simplex-MLP layers to reduce overfitting and improve accuracy by constraining weights within a simplex, validated on multiple datasets.

Contribution

The paper proposes a novel Simplex-MLP layer constrained within a simplex to mitigate overfitting in channel-wise MLPs, and develops the FSMLP framework for improved time series forecasting.

Findings

01

FSMLP achieves significant accuracy improvements on benchmark datasets.

02

Simplex-MLP reduces overfitting compared to standard MLPs.

03

Theoretical analysis shows lower Rademacher complexity for Simplex-MLP.

Abstract

Time series forecasting (TSF) plays a crucial role in various domains, including web data analysis, energy consumption prediction, and weather forecasting. While Multi-Layer Perceptrons (MLPs) are lightweight and effective for capturing temporal dependencies, they are prone to overfitting when used to model inter-channel dependencies. In this paper, we investigate the overfitting problem in channel-wise MLPs using Rademacher complexity theory, revealing that extreme values in time series data exacerbate this issue. To mitigate this issue, we introduce a novel Simplex-MLP layer, where the weights are constrained within a standard simplex. This strategy encourages the model to learn simpler patterns and thereby reducing overfitting to extreme values. Based on the Simplex-MLP layer, we propose a novel \textbf{F}requency \textbf{S}implex \textbf{MLP} (FSMLP) framework for time series…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

1. The idea of introducing a simplex constraint on MLP weights is conceptually interesting. 2. The experimental section covers multiple benchmark datasets, providing a broad empirical context for evaluation.

Weaknesses

1. In Table 1, it is unclear what type of “extreme values” the authors are referring to. The meaning of this term is ambiguous and requires clarification. 2. The content in Section 2.1 Time Series Forecasting, is not closely connected to the main topic discussed in this paper. 3. Regarding the description of the related work FreTS, the statements in lines 140–143 are incorrect. Furthermore, even on the authors’ own terms, the position expressed in lines 140–143 appears inconsistent with that i

Reviewer 02Rating 6Confidence 5

Strengths

- The paper gives a solid theoretical and empirical motivation for the overfitting problem associated with traditional channel-wise MLPs due to extreme values, as summarized in Table 1 and visually reinforced in Figure 1, which shows overfitting trend disparities among methods (FSMLP, TimesNet, TSMixer, Autoformer). - The simplex constraint is rigorously justified with Rademacher complexity bounds (Section 5, Theorem 2), and a detailed proof is given in the Appendix, explaining why the constrain

Weaknesses

1. The math formulations and step-by-step derivations for projecting weights onto the simplex lack clarity around certain details, such as the computational complexity of each transformation and how they are implemented for large-scale matrices. The choices for $$f_\mathrm{trans}$$ (absolute, log, square) are described, but a more precise algorithmic statement or pseudocode for the entire weight update procedure, particularly for batch settings, is missing. This may hinder reproducibility and un

Reviewer 03Rating 6Confidence 3

Strengths

1.A concise and general mechanism.The paper links outliers to weight-norm inflation and mitigates overfitting to data spikes by constraining channel-mixing weights within the simplex. This approach is easy to implement, interpretable as a convex combination across channels, and theoretically justified by showing that the simplex constraint yields a tighter Rademacher bound. 2.Broad and scalable empirical evidence.FSMLP remains competitive or achieves state-of-the-art results on long-horizon for

Weaknesses

1.Weak geometric motivation and interpretation.Although the paper describes the weight constraint as a “geometric constraint” that restricts parameters to lie within a standard simplex, its discussion of geometry remains largely formal, focusing only on the non-negativity and sum-to-one convex-combination conditions. The authors do not further explore the meaning of this geometric structure for the overall method, nor do they provide visualization or geometric interpretation. For instance, in cl

Code & Models

Repositories

fmlyd/fsmlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStructural Health Monitoring Techniques