TL;DR
This paper introduces a hypernetwork-driven meta-gated approach to enhance LLM adaptability across various textual conditions, outperforming finetuning and meta-learning methods.
Contribution
It proposes a novel meta-gating mechanism activated within SwiGLU blocks, enabling dynamic control of LLMs via a hypernetwork that adjusts based on textual conditions.
Findings
Outperforms finetuning and meta-learning baselines.
Generalizes well to unseen tasks and conditions.
Provides meta-controllability on LLMs.
Abstract
Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
