Partially Randomizing Transformer Weights for Dialogue Response Diversity
Jing Yang Lee, Kong Aik Lee, and Woon-Seng Gan

TL;DR
This paper introduces PaRaFormer, a simple transformer extension that improves dialogue response diversity by partially randomizing weights, achieving comparable performance without added training complexity or model size.
Contribution
Proposes PaRaFormer, a novel method that enhances response diversity by freezing selected transformer layer weights after random initialization.
Findings
PaRaFormer achieves response diversity comparable to more complex methods.
The approach does not increase training difficulty or model size.
Experimental results validate the effectiveness of partial weight randomization.
Abstract
Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam
