Neologism Learning as a Parameter-Efficient Alternative to Fine-Tuning for Model Steering
Sungjoon Park, Varun Ramamurthi, Owen Terry

TL;DR
Neologism learning offers a parameter-efficient alternative to fine-tuning for steering language models, outperforming fine-tuning under similar conditions and enabling flexible behavior modification.
Contribution
The paper demonstrates that neologism learning can surpass low-rank adaptation fine-tuning in model steering tasks with fewer parameters and maintains model flexibility.
Findings
Neologisms outperform fine-tuning in model steering tasks.
Neologism learning requires fewer parameters than fine-tuning.
Models sometimes generate new words when discussing neologisms.
Abstract
In language modeling, neologisms are new tokens trained to represent a concept not already included in a given model's vocabulary. Neologisms can be used to encourage specific behavior in models, for example by appending prompts with "Give me a neologism answer." Behavioral steering can also be achieved through fine-tuning, albeit with more compute and less flexibility: learning a neologism only trains d parameters and allows the user to still access the model's default behavior. We compare the performance of neologism learning against low-rank adaptation (LoRA) fine-tuning, finding that neologisms outperform fine-tuned models under a matched training setup (same data and hyperparameters). We also investigate self-verbalizations of neologisms, and observe that the model will occasionally make up its own new words when asked about a neologism.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
