# Filter then Attend: Improving attention-based Time Series Forecasting with Spectral Filtering

**Authors:** Elisha Dayag, Nhat Thanh Van Tran, Jack Xin

arXiv: 2508.20206 · 2026-05-13

## TL;DR

This paper introduces learnable spectral filters at the start of transformer models to improve long time-series forecasting, reducing model size and boosting accuracy by 5-10%.

## Contribution

It demonstrates that adding a simple learnable filter enhances transformer performance and efficiency in long time-series forecasting tasks.

## Key findings

- Filters improve forecasting accuracy by 5-10%.
- Adding filters allows smaller models with fewer parameters.
- Filters enable better spectral utilization in transformers.

## Abstract

Transformer-based models are at the forefront in long time-series forecasting (LTSF). While in many cases, these models are able to achieve state of the art results, they suffer from a bias toward low-frequencies in the data and high computational and memory requirements. Recent work has established that learnable frequency filters can be an integral part of a deep forecasting model by enhancing the model's spectral utilization. These works choose to use a multilayer perceptron to process their filtered signals and thus do not solve the issues found with transformer-based models. In this paper, we establish that adding a filter to the beginning of transformer-based models enhances their performance in long time-series forecasting. We add learnable filters, which only add an additional $\approx 1000$ parameters to several transformer-based models and observe in multiple instances 5-10 \% relative improvement in forecasting performance. Additionally, we find that with filters added, we are able to decrease the embedding dimension of our models, resulting in transformer-based architectures that are both smaller and more effective than their non-filtering base models. We also conduct synthetic experiments to analyze how the filters enable Transformer-based models to better utilize the full spectrum for forecasting.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20206/full.md

## Figures

23 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20206/full.md

## References

26 references — full list in the complete paper: https://tomesphere.com/paper/2508.20206/full.md

---
Source: https://tomesphere.com/paper/2508.20206