Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation

Jaber Jaber; Osama Jaber

arXiv:2604.02051·cs.LG·April 3, 2026

Ouroboros: Dynamic Weight Generation for Recursive Transformers via Input-Conditioned LoRA Modulation

Jaber Jaber, Osama Jaber

PDF

1 Repo

TL;DR

Ouroboros introduces a dynamic weight generation method for recursive transformers, enabling input-dependent transformations at each recurrence step with minimal additional parameters.

Contribution

It presents a novel Controller hypernetwork that modulates frozen LoRA bases in recursive transformers, improving training loss and performance with few extra parameters.

Findings

01

Reduces training loss by 43.4% over baseline

02

Outperforms static per-step LoRA across depths and ranks

03

Gated recurrence is crucial for effectiveness

Abstract

Recursive transformers reuse a shared weight block across multiple depth steps, trading parameters for compute. A core limitation: every step applies the same transformation, preventing the model from composing distinct operations across depth. We present Ouroboros, a system that attaches a compact Controller hypernetwork to a recursive transformer block. The Controller observes the current hidden state, produces a per-step diagonal modulation vector, and applies it to frozen SVD-initialized LoRA bases, making each recurrence step input-dependent. We combine this with gated recurrence (bias-initialized to 88% retention) and per-step LayerNorm for stable deep iteration. On Qwen2.5-3B split into a Prelude/Recurrent/Coda architecture (17 of 36 layers retained), Ouroboros reduces training loss by 43.4% over the unmodified 17-layer baseline, recovering 51.3% of the performance gap caused by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RightNow-AI/ouroboros
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.