DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence
Kai-Po Chang, Wei-Yun Ma

TL;DR
The paper introduces DCT, a transformer framework that efficiently models unbounded sequences by selectively retaining compressed sentence representations, outperforming previous models on the Enwik8 benchmark.
Contribution
It presents a novel memory management policy for transformers that improves handling of unlimited long sequences by selective compression and retention.
Findings
DCT outperforms previous SOTA on Enwik8.
Selective memory retention improves sequence modeling.
Compressed memory maintains semantic information effectively.
Abstract
In this paper, we propose Dynamic Compressive Transformer (DCT), a transformer-based framework for modeling the unbounded sequence. In contrast to the previous baselines which append every sentence representation to memory, conditionally selecting and appending them is a more reasonable solution to deal with unlimited long sequences. Our model uses a policy that determines whether the sequence should be kept in memory with a compressed state or discarded during the training process. With the benefits of retaining semantically meaningful sentence information in the memory system, our experiment results on Enwik8 benchmark show that DCT outperforms the previous state-of-the-art (SOTA) model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Algorithms and Data Compression
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Cosine Annealing · Residual Connection · Absolute Position Encodings · Compressed Memory · Adam · Linear Warmup With Cosine Annealing · Softmax
