Dodo: Dynamic Contextual Compression for Decoder-only LMs

Guanghui Qin; Corby Rosset; Ethan C. Chau; Nikhil Rao; Benjamin Van; Durme

arXiv:2310.02409·cs.CL·December 10, 2024

Dodo: Dynamic Contextual Compression for Decoder-only LMs

Guanghui Qin, Corby Rosset, Ethan C. Chau, Nikhil Rao, Benjamin Van, Durme

PDF

Open Access

TL;DR

Dodo introduces a dynamic context compression method for decoder-only language models, significantly reducing computational costs while maintaining high performance across tasks like language modeling, QA, and summarization.

Contribution

Dodo's dynamic hidden states enable efficient context compression and adaptation of off-the-shelf models with minimal tuning, improving efficiency without sacrificing accuracy.

Findings

01

20x context compression ratio with 98% BLEU score

02

Reduces self-attention cost to a fraction of standard models

03

Maintains capabilities in language modeling, QA, and summarization

Abstract

Transformer-based language models (LMs) are inefficient in long contexts. We propose Dodo, a solution for context compression. Instead of one vector per token in a standard transformer model, Dodo represents text with a dynamic number of hidden states at each layer, reducing the cost of self-attention to a fraction of typical time and space. Moreover, off-the-shelf models such as LLaMA can be adapted to Dodo by efficient parameter tuning methods such as LoRA. In use, Dodo can act as either an autoregressive LM or a context compressor for downstream tasks. We demonstrate through experiments in language modeling, question answering, and summarization that Dodo retains capabilities in these tasks, while drastically reducing the overhead during decoding. For example, in the autoencoding task, Dodo shrinks context at a 20x compression ratio with a BLEU score of 98% for reconstruction,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis