Hidden Schema Networks

Rams\'es J. S\'anchez; Lukas Conrads; Pascal Welke; Kostadin Cvejoski; and C\'esar Ojeda

arXiv:2207.03777·cs.CL·May 29, 2023

Hidden Schema Networks

Rams\'es J. S\'anchez, Lukas Conrads, Pascal Welke, Kostadin Cvejoski, and C\'esar Ojeda

PDF

Open Access

TL;DR

This paper introduces a neural language model that explicitly encodes relational structures into representations, enabling better interpretability and reasoning capabilities, especially when combined with pretrained models like BERT and GPT-2.

Contribution

It proposes a novel approach to infer explicit schema networks from language models, enhancing interpretability and reasoning by encoding semantic relations as graph structures.

Findings

01

The model can uncover ground-truth graphs from synthetic data.

02

Pretrained models can be conditioned on symbolic schema representations.

03

Schema networks improve commonsense reasoning performance.

Abstract

Large, pretrained language models infer powerful representations that encode rich semantic and syntactic content, albeit implicitly. In this work we introduce a novel neural language model that enforces, via inductive biases, explicit relational structures which allow for compositionality onto the output representations of pretrained language models. Specifically, the model encodes sentences into sequences of symbols (composed representations), which correspond to the nodes visited by biased random walkers on a global latent graph, and infers the posterior distribution of the latter. We first demonstrate that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences. Next, we leverage pretrained BERT and GPT-2 language models as encoder and decoder, respectively, to infer networks of symbols (schemata) from natural language datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Layer Normalization · Discriminative Fine-Tuning · Weight Decay