Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

Yuhua Jiang; Shuang Cheng; Yihao Liu; Ermo Hua; Che Jiang; Weigao Sun; Yu Cheng; Feifei Gao; Biqing Qi; Bowen Zhou

arXiv:2510.26083·cs.LG·April 9, 2026

Nirvana: A Specialized Generalist Model With Task-Aware Memory Mechanism

Yuhua Jiang, Shuang Cheng, Yihao Liu, Ermo Hua, Che Jiang, Weigao Sun, Yu Cheng, Feifei Gao, Biqing Qi, Bowen Zhou

PDF

1 Repo

TL;DR

Nirvana is a specialized generalist model with task-aware memory mechanisms that adapt dynamically to various domains, achieving high performance in benchmarks and domain-specific tasks like MRI reconstruction.

Contribution

Introduces Nirvana, a novel SGM with task-aware memory triggers and updaters, enabling effective domain adaptation and superior performance across multiple specialized fields.

Findings

01

Nirvana matches or surpasses LLMs on general benchmarks.

02

Achieves lowest perplexity in biomedical, financial, and legal domains.

03

Outperforms traditional models in MRI image reconstruction.

Abstract

Large Language Models (LLMs) excel at general language tasks but struggle in specialized domains. Specialized Generalist Models (SGMs) address this by preserving broad capabilities while adapting to target domains. However, existing architectures provide limited support for task-guided specialized memory mechanisms. In this work, we introduce Nirvana, an SGM featuring specialized memory, linear-time complexity, and test-time task information extraction. Central to Nirvana are: (1) Task-Aware Memory Trigger ( $Trigger$ ), which treats each input as a self-supervised fine-tuning task and adjusts task-related parameters on the fly; and (2) Specialized Memory Updater ( $Updater$ ), which dynamically consolidates task-relevant context. Nirvana matches or surpasses LLM baselines on general benchmarks and achieves the lowest perplexity across specialized domains including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YuhuaJiang2002/Nirvana
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.