Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

Maixent Chenebaux

arXiv:2604.24809·cs.LG·April 29, 2026

Nautile-370M: Spectral Memory Meets Attention in a Small Reasoning Model

Maixent Chenebaux

PDF

1 Models

TL;DR

Nautile-370M is a small, efficient language model combining spectral sequence operators with attention to enhance reasoning capabilities within strict resource constraints.

Contribution

The paper introduces Nautile-370M, a novel hybrid model architecture that integrates spectral sequence operators with attention, demonstrating expressive power and efficiency for reasoning tasks.

Findings

01

SCA readout can exactly retrieve any token from prefix summaries.

02

SCA can reproduce any softmax attention output, matching full self-attention.

03

Nautile-370M performs reasoning tasks efficiently with fewer parameters.

Abstract

We present Nautile-370M, a 371-million-parameter small language model designed for efficient reasoning under strict parameter and inference budgets. Nautile-370M uses a hybrid backbone in which two SeqCond Attention (SCA) layers, a linear-time spectral sequence operator inspired by SeqCondenser, alternate with one transformer layer. This design aims to retain the long-context efficiency and state-tracking benefits of structured sequential models while preserving the expressive token-to-token routing of attention. The model was trained on a single Cloud TPU v4-64 pod slice provided through the Google TPU Research Cloud (TRC) program; the subsequent reinforcement learning stage was carried out on a single NVIDIA DGX Spark. We prove that the SCA readout mechanism can exactly retrieve any individual token from the prefix summary and can reproduce any output of softmax attention as a special…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
trickstr-ai/nautile-370m
model· 24 dl· ♡ 3
24 dl♡ 3

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.