Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling
Qingyang Wu, Zhou Yu

TL;DR
This paper introduces a memory-augmented transformer model that enhances dialogue modeling by efficiently preserving long dialogue history, outperforming existing methods in both efficiency and performance.
Contribution
A novel memory module integrated with pre-trained transformers enables effective long-context dialogue modeling without retraining from scratch.
Findings
Superior efficiency over baseline models
Improved dialogue context retention
Enhanced performance on multiple datasets
Abstract
Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models and enables efficient preservation of the dialogue history information. By incorporating a separate memory module alongside the pre-trained transformer, the model can effectively interchange information between the memory states and the current input context. We evaluate our model on three dialogue datasets and two language modeling datasets. Experimental results show that our method has achieved superior efficiency and performance compared to other pre-trained Transformer baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Softmax · Dropout · Residual Connection · Dense Connections
