Relational Weight Priors in Neural Networks for Abstract Pattern Learning and Language Modelling
Radha Kopparti, Tillman Weyde

TL;DR
This paper introduces ERBP, a Bayesian relational prior that enhances neural networks' ability to learn abstract patterns, improving generalisation and performance in NLP and sequence tasks.
Contribution
ERBP provides a novel relational inductive bias as a Bayesian prior, improving neural networks' systematic generalisation on abstract pattern learning tasks.
Findings
ERBP achieves near-perfect generalisation on synthetic abstract pattern tasks.
ERBP improves natural language and melody prediction tasks.
ERBP outperforms RBP and standard networks across multiple benchmarks.
Abstract
Deep neural networks have become the dominant approach in natural language processing (NLP). However, in recent years, it has become apparent that there are shortcomings in systematicity that limit the performance and data efficiency of deep learning in NLP. These shortcomings can be clearly shown in lower-level artificial tasks, mostly on synthetic data. Abstract patterns are the best known examples of a hard problem for neural networks in terms of generalisation to unseen data. They are defined by relations between items, such as equality, rather than their values. It has been argued that these low-level problems demonstrate the inability of neural networks to learn systematically. In this study, we propose Embedded Relation Based Patterns (ERBP) as a novel way to create a relational inductive bias that encourages learning equality and distance-based relations for abstract patterns.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques
MethodsTanh Activation · Sigmoid Activation · Gated Recurrent Unit · Long Short-Term Memory
