Controllable Text Generation with Residual Memory Transformer
Hanqing Zhang, Sun Si, Haiming Wu, Dawei Song

TL;DR
This paper introduces Residual Memory Transformer, a lightweight control plugin for large-scale causal language models, enabling flexible, efficient, and general controllable text generation at arbitrary steps.
Contribution
The paper proposes a non-intrusive Residual Memory Transformer plugin that enhances controllable text generation by cooperating with existing language models through residual learning.
Findings
RMT outperforms state-of-the-art methods in control tasks
RMT demonstrates high versatility across different control conditions
Experiments confirm RMT's effectiveness and efficiency
Abstract
Large-scale Causal Language Models (CLMs), e.g., GPT3 and ChatGPT, have brought great success in text generation. However, it is still an open challenge to control the generation process of CLM while balancing flexibility, control granularity, and generation efficiency. In this paper, we provide a new alternative for controllable text generation (CTG), by designing a non-intrusive, lightweight control plugin to accompany the generation of CLM at arbitrary time steps. The proposed control plugin, namely Residual Memory Transformer (RMT), has an encoder-decoder setup, which can accept any types of control conditions and cooperate with CLM through a residual learning paradigm, to achieve a more flexible, general, and efficient CTG. Extensive experiments are carried out on various control tasks, in the form of both automatic and human evaluations. The results show the superiority of RMT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Absolute Position Encodings · Dense Connections · Layer Normalization · Byte Pair Encoding
