Improvement of a dedicated model for open domain persona-aware dialogue   generation

Qiang Han

arXiv:2008.11970·cs.CL·August 28, 2020

Improvement of a dedicated model for open domain persona-aware dialogue generation

Qiang Han

PDF

Open Access 1 Repo

TL;DR

This paper explores enhancements to a Transformer-based dedicated model for open domain persona-aware dialogue generation, focusing on speed and performance improvements tailored for short multi-turn dialogues, with open-source code provided.

Contribution

It introduces specific modifications to Transformer architecture optimized for short dialogue sequences in persona-aware dialogue generation.

Findings

01

Improved model training efficiency and speed.

02

Enhanced dialogue generation quality.

03

Open-source implementation available.

Abstract

This paper analyzes some speed and performance improvement methods of Transformer architecture in recent years, mainly its application in dedicated model training. The dedicated model studied here refers to the open domain persona-aware dialogue generation model, and the dataset is multi turn short dialogue, The total length of a single input sequence is no more than 105 tokens. Therefore, many improvements in the architecture and attention mechanism of transformer architecture for long sequence processing are not discussed in this paper. The source code of the experiments has been open sourced: https://github.com/ghosthamlet/persona

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ghosthamlet/persona
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Dropout · Label Smoothing · Multi-Head Attention · Residual Connection · Softmax