Join-Chain Network: A Logical Reasoning View of the Multi-head Attention   in Transformer

Jianyi Zhang; Yiran Chen; Jianshu Chen

arXiv:2210.02729·cs.CL·October 25, 2022

Join-Chain Network: A Logical Reasoning View of the Multi-head Attention in Transformer

Jianyi Zhang, Yiran Chen, Jianshu Chen

PDF

Open Access

TL;DR

This paper introduces a symbolic reasoning architecture using join-chains to model logical expressions, revealing that transformer’s multi-head attention acts as a neural approximation of join operators, offering new insights into their reasoning capabilities.

Contribution

It proposes a novel symbolic reasoning framework based on join-chains, demonstrating that multi-head attention can be interpreted as a neural join operator, bridging logical reasoning and transformer mechanisms.

Findings

01

Join-chains can model a broad subset of first-order logical expressions.

02

Multi-head self-attention acts as a neural union bound of join operators.

03

Provides new insights into the reasoning mechanism of pretrained models like BERT.

Abstract

Developing neural architectures that are capable of logical reasoning has become increasingly important for a wide range of applications (e.g., natural language processing). Towards this grand objective, we propose a symbolic reasoning architecture that chains many join operators together to model output logical expressions. In particular, we demonstrate that such an ensemble of join-chains can express a broad subset of ''tree-structured'' first-order logical expressions, named FOET, which is particularly useful for modeling natural languages. To endow it with differentiable learning capability, we closely examine various neural operators for approximating the symbolic join-chains. Interestingly, we find that the widely used multi-head self-attention module in transformer can be understood as a special neural operator that implements the union bound of the join operator in probabilistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Natural Language Processing Techniques · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · WordPiece · Adam · Dense Connections · Weight Decay · Dropout · Linear Warmup With Linear Decay · Layer Normalization