Spectral-Window Hybrid (SWH)

Vladimer Khasia

arXiv:2601.01313·cs.LG·January 6, 2026

Spectral-Window Hybrid (SWH)

Vladimer Khasia

PDF

Open Access

TL;DR

The Spectral-Window Hybrid (SWH) architecture combines global spectral methods and local attention to efficiently model long sequences, matching Transformer performance on short contexts while scaling linearly for extended sequences.

Contribution

SWH introduces a novel parallel architecture that decouples global and local sequence modeling, achieving efficient long-range context handling with linear complexity.

Findings

01

SWH matches Transformer perplexity on short sequences.

02

SWH scales linearly to longer sequences.

03

Efficient long-range sequence modeling without quadratic complexity.

Abstract

Scaling sequence modeling to extreme contexts requires balancing computational efficiency with representational expressivity. While Transformers provide precise retrieval via the attention mechanism, their quadratic $O (T^{2})$ complexity limits their application to long-horizon tasks. In this work, we propose the \textbf{Spectral-Window Hybrid (SWH)}, an architecture that decouples sequence modeling into two \textit{parallel} streams: a global branch utilizing the Convolution Theorem to model long-range decay dynamics in $O (T lo g T)$ time, and a local branch employing sliding-window attention for token interactions within a bounded context. By aggregating these representations, SWH avoids the computational bottleneck of global attention while retaining local precision. We demonstrate that SWH matches the perplexity of standard Transformers on short contexts while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Algorithms and Data Compression · Parallel Computing and Optimization Techniques