DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

Kai-Po Chang; Wei-Yun Ma

arXiv:2110.04821·cs.CL·October 12, 2021

DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence

Kai-Po Chang, Wei-Yun Ma

PDF

Open Access

TL;DR

The paper introduces DCT, a transformer framework that efficiently models unbounded sequences by selectively retaining compressed sentence representations, outperforming previous models on the Enwik8 benchmark.

Contribution

It presents a novel memory management policy for transformers that improves handling of unlimited long sequences by selective compression and retention.

Findings

01

DCT outperforms previous SOTA on Enwik8.

02

Selective memory retention improves sequence modeling.

03

Compressed memory maintains semantic information effectively.

Abstract

In this paper, we propose Dynamic Compressive Transformer (DCT), a transformer-based framework for modeling the unbounded sequence. In contrast to the previous baselines which append every sentence representation to memory, conditionally selecting and appending them is a more reasonable solution to deal with unlimited long sequences. Our model uses a policy that determines whether the sequence should be kept in memory with a compressed state or discarded during the training process. With the benefits of retaining semantically meaningful sentence information in the memory system, our experiment results on Enwik8 benchmark show that DCT outperforms the previous state-of-the-art (SOTA) model.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Algorithms and Data Compression

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Absolute Position Encodings · Compressed Memory · Adam · Linear Warmup With Cosine Annealing · Softmax