CUE Vectors: Modular Training of Language Models Conditioned on Diverse   Contextual Signals

Scott Novotney; Sreeparna Mukherjee; Zeeshan Ahmed; Andreas Stolcke

arXiv:2203.08774·cs.CL·February 9, 2024

CUE Vectors: Modular Training of Language Models Conditioned on Diverse Contextual Signals

Scott Novotney, Sreeparna Mukherjee, Zeeshan Ahmed, Andreas Stolcke

PDF

TL;DR

This paper introduces CUE vectors, a modular framework for training language models conditioned on diverse external contexts, enabling flexible adaptation and incremental training without joint encoder training.

Contribution

The authors propose a novel modular training approach that separates sentence and context encoding, allowing easy adaptation to new metadata types and incremental learning.

Findings

01

Conditioning on context reduces perplexity from 36.6 to 27.4.

02

Retaining 85% of gains with partial context during training.

03

Swapping pretrained sentence LMs without retraining encoders is feasible.

Abstract

We propose a framework to modularize the training of neural language models that use diverse forms of sentence-external context (including metadata) by eliminating the need to jointly train sentence-external and within-sentence encoders. Our approach, contextual universal embeddings (CUE), trains LMs on one set of context, such as date and author, and adapts to novel metadata types, such as article title, or previous sentence. The model consists of a pretrained neural sentence LM, a BERT-based context encoder, and a masked transformer decoder that estimates LM probabilities using sentence-internal and sentence-external information. When context or metadata are unavailable, our model learns to combine contextual and sentence-internal information using noisy oracle unigram embeddings as a proxy. Real contextual information can be introduced later and used to adapt a small number of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.