TL;DR
Nirvana is a specialized generalist model with task-aware memory mechanisms that adapt dynamically to various domains, achieving high performance in benchmarks and domain-specific tasks like MRI reconstruction.
Contribution
Introduces Nirvana, a novel SGM with task-aware memory triggers and updaters, enabling effective domain adaptation and superior performance across multiple specialized fields.
Findings
Nirvana matches or surpasses LLMs on general benchmarks.
Achieves lowest perplexity in biomedical, financial, and legal domains.
Outperforms traditional models in MRI image reconstruction.
Abstract
Large Language Models (LLMs) excel at general language tasks but struggle in specialized domains. Specialized Generalist Models (SGMs) address this by preserving broad capabilities while adapting to target domains. However, existing architectures provide limited support for task-guided specialized memory mechanisms. In this work, we introduce Nirvana, an SGM featuring specialized memory, linear-time complexity, and test-time task information extraction. Central to Nirvana are: (1) Task-Aware Memory Trigger (), which treats each input as a self-supervised fine-tuning task and adjusts task-related parameters on the fly; and (2) Specialized Memory Updater (), which dynamically consolidates task-relevant context. Nirvana matches or surpasses LLM baselines on general benchmarks and achieves the lowest perplexity across specialized domains including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
