Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention

George Fountzoulas

arXiv:2604.07969·cs.CL·May 14, 2026

Kathleen: Oscillator-Based Byte-Level Text Classification Without Tokenization or Attention

George Fountzoulas

PDF

TL;DR

Kathleen is a novel byte-level text classification model that operates without tokenization or attention, using frequency-domain processing and recurrent oscillators, achieving competitive accuracy with fewer parameters.

Contribution

The paper introduces new components and architecture innovations for byte-level text classification, eliminating the need for pretraining and tokenization while improving accuracy.

Findings

01

Achieves 88.5% on IMDB without pretraining

02

Matches or exceeds pretrained models on benchmarks

03

Uses fewer parameters (469K) than baseline models

Abstract

We present Kathleen, a text classification architecture that operates directly on raw UTF-8 bytes using frequency-domain processing -- requiring no tokenizer, no attention mechanism, and under 470K parameters. Kathleen introduces several novel components: (1) RecurrentOscillatorBanks -- damped sinusoid convolutions with temporal memory for O(L) sequence processing; (2) an FFT-Rotate Wavetable Encoder that maps all 256 byte values using a single learnable vector (256 floats); (3) PhaseHarmonics -- a sinusoidal non-linearity with just 6 learnable phase parameters (+2.6% accuracy, <0.001% of model parameters); (4) Content-Dependent Reverb with Positional Decay Modulation -- a temporal memory mechanism whose decay rate is jointly conditioned on input content and a learned position-indexed bias vector; (5) Token-Level Module Sequencer with consonance and dissonance interference channels.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.