Hidden Schema Networks
Rams\'es J. S\'anchez, Lukas Conrads, Pascal Welke, Kostadin Cvejoski, and C\'esar Ojeda

TL;DR
This paper introduces a neural language model that explicitly encodes relational structures into representations, enabling better interpretability and reasoning capabilities, especially when combined with pretrained models like BERT and GPT-2.
Contribution
It proposes a novel approach to infer explicit schema networks from language models, enhancing interpretability and reasoning by encoding semantic relations as graph structures.
Findings
The model can uncover ground-truth graphs from synthetic data.
Pretrained models can be conditioned on symbolic schema representations.
Schema networks improve commonsense reasoning performance.
Abstract
Large, pretrained language models infer powerful representations that encode rich semantic and syntactic content, albeit implicitly. In this work we introduce a novel neural language model that enforces, via inductive biases, explicit relational structures which allow for compositionality onto the output representations of pretrained language models. Specifically, the model encodes sentences into sequences of symbols (composed representations), which correspond to the nodes visited by biased random walkers on a global latent graph, and infers the posterior distribution of the latter. We first demonstrate that the model is able to uncover ground-truth graphs from artificially generated datasets of random token sequences. Next, we leverage pretrained BERT and GPT-2 language models as encoder and decoder, respectively, to infer networks of symbols (schemata) from natural language datasets.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Layer Normalization · Discriminative Fine-Tuning · Weight Decay
