Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

Qingyang Wu; Zhou Yu

arXiv:2209.07634·cs.CL·May 24, 2023

Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

Qingyang Wu, Zhou Yu

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces a memory-augmented transformer model that enhances dialogue modeling by efficiently preserving long dialogue history, outperforming existing methods in both efficiency and performance.

Contribution

A novel memory module integrated with pre-trained transformers enables effective long-context dialogue modeling without retraining from scratch.

Findings

01

Superior efficiency over baseline models

02

Improved dialogue context retention

03

Enhanced performance on multiple datasets

Abstract

Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models and enables efficient preservation of the dialogue history information. By incorporating a separate memory module alongside the pre-trained transformer, the model can effectively interchange information between the memory states and the current input context. We evaluate our model on three dialogue datasets and two language modeling datasets. Experimental results show that our method has achieved superior efficiency and performance compared to other pre-trained Transformer baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

qywu/memformers
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections