Partially Randomizing Transformer Weights for Dialogue Response   Diversity

Jing Yang Lee; Kong Aik Lee; and Woon-Seng Gan

arXiv:2311.10943·cs.CL·November 21, 2023·1 cites

Partially Randomizing Transformer Weights for Dialogue Response Diversity

Jing Yang Lee, Kong Aik Lee, and Woon-Seng Gan

PDF

Open Access

TL;DR

This paper introduces PaRaFormer, a simple transformer extension that improves dialogue response diversity by partially randomizing weights, achieving comparable performance without added training complexity or model size.

Contribution

Proposes PaRaFormer, a novel method that enhances response diversity by freezing selected transformer layer weights after random initialization.

Findings

01

PaRaFormer achieves response diversity comparable to more complex methods.

02

The approach does not increase training difficulty or model size.

03

Experimental results validate the effectiveness of partial weight randomization.

Abstract

Despite recent progress in generative open-domain dialogue, the issue of low response diversity persists. Prior works have addressed this issue via either novel objective functions, alternative learning approaches such as variational frameworks, or architectural extensions such as the Randomized Link (RL) Transformer. However, these approaches typically entail either additional difficulties during training/inference, or a significant increase in model size and complexity. Hence, we propose the \underline{Pa}rtially \underline{Ra}ndomized trans\underline{Former} (PaRaFormer), a simple extension of the transformer which involves freezing the weights of selected layers after random initialization. Experimental results reveal that the performance of the PaRaformer is comparable to that of the aforementioned approaches, despite not entailing any additional training difficulty or increase in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam