Complexity of Symbolic Representation in Working Memory of Transformer   Correlates with the Complexity of a Task

Alsu Sagirova; Mikhail Burtsev

arXiv:2406.14213·cs.CL·June 21, 2024

Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task

Alsu Sagirova, Mikhail Burtsev

PDF

TL;DR

This paper investigates how adding symbolic working memory to Transformers improves machine translation by storing key concepts, with memory complexity correlating with task difficulty and content relevance.

Contribution

It introduces a neural-symbolic working memory component to Transformers and demonstrates its impact on translation quality and memory content relevance.

Findings

01

Memory content includes keywords from translated text

02

Memory diversity correlates with corpus complexity

03

Memory enhances translation accuracy

Abstract

Even though Transformers are extensively used for Natural Language Processing tasks, especially for machine translation, they lack an explicit memory to store key concepts of processed texts. This paper explores the properties of the content of symbolic working memory added to the Transformer model decoder. Such working memory enhances the quality of model predictions in machine translation task and works as a neural-symbolic representation of information that is important for the model to make correct translations. The study of memory content revealed that translated text keywords are stored in the working memory, pointing to the relevance of memory content to the processed text. Also, the diversity of tokens and parts of speech stored in memory correlates with the complexity of the corpora for machine translation task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam