Interpreting Language Models Through Knowledge Graph Extraction
Vinitra Swamy, Angelika Romanou, Martin Jaggi

TL;DR
This paper introduces a methodology to analyze and compare transformer-based language models by extracting and examining knowledge graphs at different training stages, providing insights beyond traditional accuracy metrics.
Contribution
It presents a novel framework for quantifying knowledge acquisition in language models through knowledge graph extraction and comparison across model variants and training stages.
Findings
Knowledge graphs reveal the evolution of model understanding during training.
Different BERT variants exhibit distinct linguistic strengths.
The framework enables targeted model diagnostics and dataset improvements.
Abstract
Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Dropout · Residual Connection
