Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings
Na Li, Zied Bouraoui, Jose Camacho Collados, Luis Espinosa-Anke, Qing, Gu, Steven Schockaert

TL;DR
This paper demonstrates that averaging masked contextualised embeddings from BERT produces high-quality noun vectors that outperform traditional static embeddings in capturing semantic properties, especially when filtering out idiosyncratic mentions.
Contribution
It introduces a simple averaging method of masked BERT embeddings for noun representation and a filtering strategy to enhance semantic property induction.
Findings
Averaged masked embeddings outperform static BERT vectors.
Masking target words improves focus on general semantic properties.
Filtering mention vectors further enhances property induction performance.
Abstract
While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, such vectors continue to play an important role in tasks where words need to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Linear Warmup With Linear Decay · Attention Is All You Need · Layer Normalization · Dropout · Weight Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Adam · Attention Dropout
