ChameleonLLM: Batch-Aware Dynamic Low-Rank Adaptation via Inference-Time Clusters
Kamer Ali Yuksel, Hassan Sawaf

TL;DR
ChameleonLLM introduces a dynamic, inference-time adaptation framework for large language models that uses batch-aware clustering and low-rank updates to improve performance without additional model maintenance.
Contribution
It proposes a novel method for real-time LLM adaptation using clustering and hyper-networks, outperforming traditional fine-tuning approaches like LoRA.
Findings
Outperforms conventional LoRA methods in experiments.
Eliminates the need for multiple expert models.
Provides a versatile, adaptive inference solution.
Abstract
Recent advances in large language models (LLMs) have shown remarkable performance across diverse tasks. However, these models are typically deployed with fixed weights, which limits their ability to adapt dynamically to the variability inherent in real-world data during inference. This paper introduces ChameleonLLM, a novel framework that enables inference-time adaptation of LLMs by leveraging batch-aware clustering and on-the-fly generation of low-rank updates. Unlike traditional fine-tuning approaches such as Low-Rank Adaptation (LoRA) or methods that rely on a fixed set of pre-learned uniforms (changeable masks), our method dynamically generates adaptive modifications to the decoder weights based on the aggregated statistics of clustered batches. By intelligently grouping similar inputs and computing context-aware low-rank updates via a hyper-network, ChameleonLLM achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsSparse Evolutionary Training
