Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs
Longhuan Xu, Cunjian Chen, Feng Yin

TL;DR
This paper introduces a layer-wise dynamic test-time adaptation method for large language models that adapt per prompt using a hypernetwork to control learning rates, improving stability and performance during unsupervised, sample-specific TTA.
Contribution
It proposes a novel framework that modulates TTA strength at a granular level using a hypernetwork, addressing instability issues in unsupervised prompt-specific adaptation.
Findings
Enhanced stability of TTA across datasets and models
Improved generation quality with adaptive learning rates
Effective layer-wise scaling patterns learned by the hypernetwork
Abstract
Test-time adaptation (TTA) for large language models (LLMs) updates model parameters at inference time using signals available at deployment. This paper focuses on a common yet under-explored regime: unsupervised, sample-specific TTA, where the model adapts independently for each prompt using only the prompt itself, without gold answers or external supervision. Although appealing, naive unsupervised TTA with a fixed, handcrafted learning rate can be unstable: updates may overfit to prompt-specific statistics, drift from the desired answer distribution, and ultimately degrade generation quality. This failure mode is not surprising, as in this case TTA must adapt to a single prompt within only a few gradient steps, unlike standard training that averages updates over large datasets and long optimization horizons. Therefore, we propose layer-wise dynamic test-time adaptation, a framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Advanced Neural Network Applications
