Interpreting Language Models Through Knowledge Graph Extraction

Vinitra Swamy; Angelika Romanou; Martin Jaggi

arXiv:2111.08546·cs.LG·November 17, 2021

Interpreting Language Models Through Knowledge Graph Extraction

Vinitra Swamy, Angelika Romanou, Martin Jaggi

PDF

Open Access 1 Repo

TL;DR

This paper introduces a methodology to analyze and compare transformer-based language models by extracting and examining knowledge graphs at different training stages, providing insights beyond traditional accuracy metrics.

Contribution

It presents a novel framework for quantifying knowledge acquisition in language models through knowledge graph extraction and comparison across model variants and training stages.

Findings

01

Knowledge graphs reveal the evolution of model understanding during training.

02

Different BERT variants exhibit distinct linguistic strengths.

03

The framework enables targeted model diagnostics and dataset improvements.

Abstract

Transformer-based language models trained on large text corpora have enjoyed immense popularity in the natural language processing community and are commonly used as a starting point for downstream tasks. While these models are undeniably useful, it is a challenge to quantify their performance beyond traditional accuracy metrics. In this paper, we compare BERT-based language models through snapshots of acquired knowledge at sequential stages of the training process. Structured relationships from training corpora may be uncovered through querying a masked language model with probing tasks. We present a methodology to unveil a knowledge acquisition timeline by generating knowledge graph extracts from cloze "fill-in-the-blank" statements at various stages of RoBERTa's early training. We extend this analysis to a comparison of pretrained variations of BERT models (DistilBERT, BERT-base,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

epfml/interpret-lm-knowledge
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Graph Neural Networks

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Layer Normalization · Multi-Head Attention · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Dropout · Residual Connection