OLMA: One Loss for More Accurate Time Series Forecasting

Tianyi Shi; Zhu Meng; Yue Chen; Siyang Zheng; Fei Su; Jin Huang; Changrui Ren; Zhicheng Zhao

arXiv:2505.11567·cs.LG·September 26, 2025

OLMA: One Loss for More Accurate Time Series Forecasting

Tianyi Shi, Zhu Meng, Yue Chen, Siyang Zheng, Fei Su, Jin Huang, Changrui Ren, Zhicheng Zhao

PDF

Open Access 3 Reviews

TL;DR

OLMA introduces a novel loss function and frequency domain supervision techniques to address noise and frequency bias in time series forecasting, significantly improving accuracy across multiple datasets.

Contribution

The paper proposes OLMA, a new loss function combined with frequency domain supervision, to reduce forecasting error bounds and mitigate neural network frequency bias in time series prediction.

Findings

01

OLMA improves forecasting accuracy on multiple datasets.

02

DFT reduces label entropy in most scenarios.

03

Frequency domain supervision enhances model performance.

Abstract

Time series forecasting faces two important but often overlooked challenges. Firstly, the inherent random noise in the time series labels sets a theoretical lower bound for the forecasting error, which is positively correlated with the entropy of the labels. Secondly, neural networks exhibit a frequency bias when modeling the state-space of time series, that is, the model performs well in learning certain frequency bands but poorly in others, thus restricting the overall forecasting performance. To address the first challenge, we prove a theorem that there exists a unitary transformation that can reduce the marginal entropy of multiple correlated Gaussian processes, thereby providing guidance for reducing the lower bound of forecasting error. Furthermore, experiments confirm that Discrete Fourier Transform (DFT) can reduce the entropy in the majority of scenarios. Correspondingly, to…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 5

Strengths

1. The theoretical derivations supporting the proposed approach are interesting and appear solid. 2. The joint use of frequency and spatial domains within a unified framework is appealing for time-series forecasting tasks. 3. The topic of time-series forecasting is highly relevant to the ICLR community.

Weaknesses

1. Recent studies have explored loss functions based on label transformation for time-series forecasting. In particular, FreDF [1] introduces a frequency-domain loss that applies the Discrete Fourier Transform (DFT) to both labels and predictions, minimizing their frequency-domain discrepancies. **The core formulation in the current paper (Eq.11), appears conceptually and mathematically identical to Eq.3 in [1].** However, **the paper does not acknowledge or discuss these prior contributions inc

Reviewer 02Rating 4Confidence 4

Strengths

- The paper links channel correlations to data entropy and forecasting error. This is an interesting perspective. - The proposed channel loss component $\mathcal{L}_{olma}^{(c)}$ is a novel idea. - The experiments show that OLMA helps models learn low frequency components better (Figure 2). - The method works on many models and datasets, showing good general use.

Weaknesses

- The main theoretical claim is weak. Theorem 1 proves a unitary transform exists to reduce entropy. It does not prove that DFT is that transform. The paper seems to assume this without justification. - The optimal transform for decorrelation is KLT (PCA), not DFT. Why was DFT chosen? This gap between theory and practice is a major problem. - The paper's own results contradict its hypothesis. Figure 1 shows DFT increases entropy for the ECL dataset. But Table 1 shows OLMA improves performance on

Reviewer 03Rating 4Confidence 3

Strengths

1. This paper proposes a theoretical connection between prediction error and label entropy, providing a principled foundation for exploring time series forecasting. 2. The overall framework combines theory, loss design, and empirical validation in a coherent and easily interpretable manner. 3. The experimental results are comprehensive, spanning multiple datasets and various backbone models, and demonstrate consistent improvements.

Weaknesses

1. This paper assumes that $\hat{x}$ is an unbiased estimator of $x$, but this property cannot be guaranteed in practice for complex neural network regressors or noisy time series labels. 2. Theorem 1 assumes that multiple Gaussian random processes are independent and identically distributed, which is difficult to prove for correlated multivariate time series. Real-world variables (for example, in ECL or traffic data) often have strong interdependencies. 3. Although OLMA introduces a loss-base

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Forecasting Techniques and Applications

MethodsFocus