CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals
Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke

TL;DR
This paper introduces CUE vectors, a modular framework for training language models conditioned on diverse external contexts, enabling flexible adaptation and incremental training without joint encoder training.
Contribution
The authors propose a novel modular training approach that separates sentence and context encoding, allowing easy adaptation to new metadata types and incremental learning.
Findings
Conditioning on context reduces perplexity from 36.6 to 27.4.
Retaining 85% of gains with partial context during training.
Swapping pretrained sentence LMs without retraining encoders is feasible.
Abstract
We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real contextual information can be introduced later and used to adapt a small number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
