Improvement of a dedicated model for open domain persona-aware dialogue generation
Qiang Han

TL;DR
This paper explores enhancements to a Transformer-based dedicated model for open domain persona-aware dialogue generation, focusing on speed and performance improvements tailored for short multi-turn dialogues, with open-source code provided.
Contribution
It introduces specific modifications to Transformer architecture optimized for short dialogue sequences in persona-aware dialogue generation.
Findings
Improved model training efficiency and speed.
Enhanced dialogue generation quality.
Open-source implementation available.
Abstract
This paper analyzes some speed and performance improvement methods of Transformer architecture in recent years, mainly its application in dedicated model training. The dedicated model studied here refers to the open domain persona-aware dialogue generation model, and the dataset is multi turn short dialogue, The total length of a single input sequence is no more than 105 tokens. Therefore, many improvements in the architecture and attention mechanism of transformer architecture for long sequence processing are not discussed in this paper. The source code of the experiments has been open sourced: https://github.com/ghosthamlet/persona
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Byte Pair Encoding · Dropout · Label Smoothing · Multi-Head Attention · Residual Connection · Softmax
