Do Transformers know symbolic rules, and would we know if they did?

Tommi Gr\"ondahl; Yujia Guo; N. Asokan

arXiv:2203.00162·cs.LG·March 2, 2023·1 cites

Do Transformers know symbolic rules, and would we know if they did?

Tommi Gr\"ondahl, Yujia Guo, N. Asokan

PDF

Open Access

TL;DR

This paper critically examines whether Transformers truly understand symbolic rules, proposing criteria for symbolic capacity, analyzing prior work, and conducting experiments on T5 to explore their generalization and potential symbolic architecture roles.

Contribution

It introduces criteria for assessing symbolic capacity in Transformers, critiques existing evaluations, and proposes a new perspective on their role in symbolic architectures.

Findings

01

Transformers show stronger generalization in sequence-to-sequence tasks.

02

Current experiments are inconclusive about symbolic understanding due to design issues.

03

Transformers may function as part of a symbolic system without being inherently symbolic.

Abstract

To improve the explainability of leading Transformer networks used in NLP, it is important to tease apart genuine symbolic rules from merely associative input-output patterns. However, we identify several inconsistencies in how ``symbolicity'' has been construed in recent NLP literature. To mitigate this problem, we propose two criteria to be the most relevant, one pertaining to a system's internal architecture and the other to the dissociation between abstract rules and specific input identities. From this perspective, we critically examine prior work on the symbolic capacities of Transformers, and deem the results to be fundamentally inconclusive for reasons inherent in experiment design. We further maintain that there is no simple fix to this problem, since it arises -- to an extent -- in all end-to-end settings. Nonetheless, we emphasize the need for more robust evaluation of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Software Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Attention Dropout · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer · WordPiece · Residual Connection