TPTT: Transforming Pretrained Transformers into Titans

Fabien Furfaro

arXiv:2506.17671·cs.CL·September 3, 2025

TPTT: Transforming Pretrained Transformers into Titans

Fabien Furfaro

PDF

Open Access 7 Models

TL;DR

TPTT is a framework that enhances pretrained Transformer models with linearized attention and memory gating, improving efficiency and accuracy for long-context NLP tasks without full retraining.

Contribution

It introduces TPTT, a novel method combining linearized attention and memory gating, enabling efficient fine-tuning of pretrained Transformers across various model sizes.

Findings

01

Up to 20% improvement in Exact Match scores on MMLU benchmark.

02

Feasibility of converting quadratic-attention models to linear-attention models.

03

Effective fine-tuning with modest computational resources.

Abstract

Transformer-based large language models (LLMs) have achieved strong performance across many natural language processing tasks. Nonetheless, their quadratic computational and memory requirements, particularly in self-attention layers, pose challenges for efficient inference on long contexts and for deployment in resource-limited environments. We present TPTT (Transforming Pretrained Transformers into Titans), a framework designed to augment pretrained Transformers with linearized attention (LiZA) and internal memory gating via Memory as Gate (MaG), applied without full retraining. TPTT supports parameter-efficient fine-tuning (LoRA) and integrates with standard toolkits such as Hugging Face Transformers. We evaluated TPTT on several pretrained models, including Llama-1B, OlMoE-1B-7B, Qwen2.5-1.5B, Gemma3-270m, OpenELM-1.3B, and Mistral-7B, in order to assess applicability across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWelding Techniques and Residual Stresses · Hydrogen embrittlement and corrosion behaviors in metals