Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models
Vil\'em Zouhar, Marius Mosbach, Dietrich Klakow

TL;DR
This paper introduces a method to enhance autoregressive language models by fusing prefix embeddings from masked language models, improving perplexity and demonstrating potential for integrating diverse information sources.
Contribution
It proposes a novel fusion approach to incorporate prefix embeddings into LSTM-based autoregressive models, improving perplexity and enabling multi-source information integration.
Findings
Fusion reduces perplexity from 16.74 to 15.80.
Perplexity improvement persists across different domains.
Surprisal estimates do not correlate better with human reading times.
Abstract
Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e.g. concatenation) to obtain a richer context representation for language modelling. We find that fusion helps reliably in lowering the perplexity (16.74 15.80), which is even preserved after a transfer to a dataset from a different domain than the training data. We also evaluate the best-performing fusion model by correlating its next word surprisal estimates with human reading times. Contradicting our expectation, and despite the improvement in perplexity overall, the correlation remains the same as for the baseline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsBalanced Selection
