GiBERT: Introducing Linguistic Knowledge into BERT through a Lightweight Gated Injection Method
Nicole Peinelt, Marek Rei, Maria Liakata

TL;DR
This paper presents GiBERT, a method to enhance BERT by injecting linguistic knowledge through lightweight gating, improving semantic understanding especially with synonyms and dependency information.
Contribution
GiBERT introduces a novel lightweight gating mechanism to inject linguistic knowledge into BERT at any layer, enhancing its semantic capabilities.
Findings
Improved performance on semantic similarity datasets
Counter-fitted embeddings aid synonym recognition
Linguistic knowledge injection benefits BERT's understanding
Abstract
Large pre-trained language models such as BERT have been the driving force behind recent improvements across many NLP tasks. However, BERT is only trained to predict missing words - either behind masks or in the next sentence - and has no knowledge of lexical, syntactic or semantic information beyond what it picks up through unsupervised pre-training. We propose a novel method to explicitly inject linguistic knowledge in the form of word embeddings into any layer of a pre-trained BERT. Our performance improvements on multiple semantic similarity datasets when injecting dependency-based and counter-fitted embeddings indicate that such information is beneficial and currently missing from the original model. Our qualitative analysis shows that counter-fitted embedding injection particularly helps with cases involving synonym pairs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research
MethodsLinear Layer · Adam · Softmax · Layer Normalization · Dense Connections · Multi-Head Attention · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Attention Dropout
