Mixed-effects transformers for hierarchical adaptation
Julia White, Noah Goodman, Robert Hawkins

TL;DR
This paper introduces mixed-effects transformers (MET), a hierarchical adaptation method that uses structured prefixes to improve language model performance across diverse and sparse contexts, outperforming traditional prompting techniques.
Contribution
The paper proposes a novel hierarchical adaptation approach using mixed-effects models integrated with transformers via prefix-tuning, enabling efficient adaptation to new contexts with minimal data.
Findings
Efficient adaptation to novel contexts with minimal data
Effective generalization to unseen contexts
Outperforms prompt-based methods on domain-adaptation benchmarks
Abstract
Language use differs dramatically from context to context. To some degree, modern language models like GPT-3 are able to account for such variance by conditioning on a string of previous input text, or prompt. Yet prompting is ineffective when contexts are sparse, out-of-sample, or extra-textual; for instance, accounting for when and where the text was produced or who produced it. In this paper, we introduce the mixed-effects transformer (MET), a novel approach for learning hierarchically-structured prefixes -- lightweight modules prepended to the input -- to account for structured variation. Specifically, we show how the popular class of mixed-effects models may be extended to transformer-based architectures using a regularized prefix-tuning procedure with dropout. We evaluate this approach on several domain-adaptation benchmarks, finding that it efficiently adapts to novel contexts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
Methods15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Residual Connection · Softmax · Weight Decay · Adam · Cosine Annealing
