Will it Unblend?
Yuval Pinter, Cassandra L. Jacobs, Jacob Eisenstein

TL;DR
This paper investigates how large-scale language models interpret out-of-vocabulary blends, revealing their limitations in understanding fused words and assessing the effectiveness of different embedding systems.
Contribution
The study introduces a novel dataset of English OOV blends and evaluates the performance of various models in recognizing and interpreting these complex words.
Findings
BERT's representations are semantically impoverished for blends.
Character loss during blend formation hampers meaning access.
Context-aware embeddings outperform other models but still struggle.
Abstract
Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or more bases that relate to the intended meaning in unpredictable manners and degrees. In this work, we run experiments on a novel dataset of English OOV blends to quantify the difficulty of interpreting the meanings of blends by large-scale contextual language models such as BERT. We first show that BERT's processing of these blends does not fully access the component meanings, leaving their contextual representations semantically impoverished. We find this is mostly due to the loss of characters resulting from blend formation. Then, we assess how easily different models can recognize the structure and recover the origin of blends, and find that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsLinear Layer · Adam · Softmax · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Weight Decay · Dropout · Linear Warmup With Linear Decay · Attention Dropout · Layer Normalization
