Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving
Imanol Schlag, Paul Smolensky, Roland Fernandez, Nebojsa Jojic,, J\"urgen Schmidhuber, Jianfeng Gao

TL;DR
This paper introduces the TP-Transformer, which enhances the standard Transformer with explicit relational encoding using Tensor-Product Representations, achieving state-of-the-art results on a challenging math problem dataset.
Contribution
It proposes a novel TP-Attention mechanism that explicitly encodes relations, improving the Transformer’s ability to solve complex math word problems.
Findings
Sets new state-of-the-art on Mathematics Dataset
TP-Attention improves relation encoding in Transformers
Provides better interpretability of attention maps
Abstract
We incorporate Tensor-Product Representations within the Transformer in order to better support the explicit representation of relation structure. Our Tensor-Product Transformer (TP-Transformer) sets a new state of the art on the recently-introduced Mathematics Dataset containing 56 categories of free-form math word-problems. The essential component of the model is a novel attention mechanism, called TP-Attention, which explicitly encodes the relations between each Transformer cell and the other cells from which values have been retrieved by attention. TP-Attention goes beyond linear combination of retrieved values, strengthening representation-building and resolving ambiguities introduced by multiple layers of standard attention. The TP-Transformer's attention maps give better insights into how it is capable of solving the Mathematics Dataset's challenging problems. Pretrained models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Games and Gamification · Teaching and Learning Programming
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
