TL;DR
This paper introduces skill neologisms, soft tokens integrated into language models' vocabularies, enabling scalable, skill-specific learning without weight updates, and demonstrates their composability and effectiveness in natural language tasks.
Contribution
The paper proposes skill neologisms as a novel method for skill-based continual learning, allowing models to acquire and compose new skills without catastrophic forgetting.
Findings
Skill neologisms can be learned to enhance specific skills.
They are composable with out-of-distribution skills.
Zero-shot composition works on the Skill-Mix benchmark.
Abstract
Modern LLMs show mastery over an ever-growing range of skills, as well as the ability to compose them flexibly. However, extending model capabilities to new skills in a scalable manner is an open problem: fine-tuning and parameter-efficient variants risk catastrophic forgetting, while context-based approaches have limited expressiveness and are constrained by the model's effective context. We explore skill neologisms--soft tokens integrated in the model's vocabulary and optimized to improve capabilities over a specific skill--as a way to selectively acquire new skills without weight updates. We first observe that pre-trained LLMs already exhibit tokens associated with procedural knowledge. We then show on a controlled synthetic task that skill neologisms can be learned to improve model capabilities on specific skills while being composable with out-of-distribution skills, and that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
