Caracal: Causal Architecture via Spectral Mixing

Bingzheng Gan; Tianyi Zhang; Yusu Li; Jing Huang; Wei Shi; Yangkai Ding; Tao Yu

arXiv:2605.00292·cs.LG·May 8, 2026

Caracal: Causal Architecture via Spectral Mixing

Bingzheng Gan, Tianyi Zhang, Yusu Li, Jing Huang, Wei Shi, Yangkai Ding, Tao Yu

PDF

TL;DR

Caracal introduces a Fourier-based architecture for long-sequence modeling that is scalable, efficient, and portable, addressing key limitations of traditional attention mechanisms in large language models.

Contribution

It replaces attention with a spectral mixing module using FFT, enabling scalable, portable, and autoregressive long-sequence modeling without hardware-specific optimizations.

Findings

01

Caracal achieves competitive performance with Transformer and SSM baselines.

02

It offers O(L log L) complexity, improving scalability.

03

The model is portable and easy to deploy using standard libraries.

Abstract

The scalability of Large Language Models to long sequences is hindered by the quadratic cost of attention and the limitations of positional encodings. To address these, we introduce Caracal, a novel architecture that replaces attention with a parameter-efficient, O(L log(L)) Multi-Head Fourier (MHF) module. Our contributions are threefold: (1) We leverage the Fast Fourier Transform (FFT) for sequence mixing, inherently addressing both bottlenecks mentioned above. (2) We apply a frequency-domain causal masking technique that enforces autoregressive capabilities via asymmetric padding and truncation, overcoming a critical barrier for Fourier-based generative models. (3) Unlike efficient models relying on hardware-specific implementations (e.g., Mamba), we uses standard library operators. This ensures robust portability, eliminating common deployment barriers. Evaluations demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.