CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling
Dejan \v{C}ugalj, Aleksandar Jevremovic

TL;DR
CAWN introduces a novel continuous sequence-mixing architecture for language modeling that efficiently handles ultra-long contexts, overcoming traditional quadratic scaling and signal degradation issues.
Contribution
It proposes a fully continuous, phase-based attention mechanism with novel gating and frequency retention techniques, enabling scalable long-context language modeling.
Findings
Successfully scaled to a 150M-parameter model.
Achieved robust vocabulary acquisition and extended contextual denoising.
Retrieved information across 2,000,000 tokens with limited VRAM usage.
Abstract
Modern Large Language Models (LLMs) rely on Transformer self-attention, which scales quadratically with sequence length. Recent linear-time alternatives, like State Space Models (SSMs), often suffer from signal degradation over extended contexts. We introduce the Continuous Acoustic Wave Network (CAWN), a fully continuous sequence-mixing architecture. Instead of discrete matrix-based attention, CAWN projects hidden states into multi-headed complex-domain phasors, achieving sequence mixing through a causal, Phase Accumulation mechanism. To prevent signal degradation over ultra-long contexts, we introduce a dual-gated Selective Phase Resonance mechanism incorporating Frequency-Dependent Retention, Hard-Threshold Gating via Straight-Through Estimation, and a Temporal Syntax Cache to capture short-term local dependencies. We also replace standard dense linear projections with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
