Attending to Mathematical Language with Transformers

Artit Wangperawong

arXiv:1812.02825·cs.CL·September 17, 2019·21 cites

Attending to Mathematical Language with Transformers

Artit Wangperawong

PDF

Open Access 3 Repos

TL;DR

This paper explores transformer-based neural networks for understanding and evaluating mathematical expressions, demonstrating high accuracy in symbolic computation tasks and learning fundamental arithmetic operations.

Contribution

It introduces and compares three transformer models for mathematical language understanding, highlighting their effectiveness in symbolic evaluation tasks.

Findings

01

Models achieved up to 84.9% accuracy in expression evaluation

02

Models learned basic arithmetic operations on decimal numbers

03

Incorrect inferences differed by only one or two characters

Abstract

Mathematical expressions were generated, evaluated and used to train neural network models based on the transformer architecture. The expressions and their targets were analyzed as a character-level sequence transduction task in which the encoder and decoder are built on attention mechanisms. Three models were trained to understand and evaluate symbolic variables and expressions in mathematics: (1) the self-attentive and feed-forward transformer without recurrence or convolution, (2) the universal transformer with recurrence, and (3) the adaptive universal transformer with recurrence and adaptive computation time. The models respectively achieved test accuracies as high as 76.1%, 78.8% and 84.9% in evaluating the expressions to match the target values. For the cases inferred incorrectly, the results differed from the targets by only one or two characters. The models notably learned to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Neural Networks and Applications

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Attention Dropout · Universal Transformer · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia?