Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems
Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

TL;DR
This paper introduces Markovian architectures over pretrained language models for task-oriented dialog systems, improving efficiency and training in low-resource scenarios by leveraging the Markov property of dialog states.
Contribution
It proposes a novel Markovian generative architecture that reduces memory and computation costs, especially effective in low-resource training settings.
Findings
Reduces memory and time costs in rich-resource settings.
Improves training efficiency in low-resource scenarios.
Maintains performance comparable to non-Markov models.
Abstract
Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using the whole history increases model complexity and may hurt the training efficiency, especially when facing small amounts of labeled training data (the low-resource setting). In this paper, motivated by the observation that dialog states could be viewed as Markov states, we propose to build Markovian Generative Architectures (MGA) over PLM backbones for efficient TOD systems. Experiments on MultiWOZ2.1 show that in the rich-resource setting, the proposed Markov models reduce memory and time costs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis
MethodsAttention Is All You Need · Linear Layer · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit · Adafactor · Inverse Square Root Schedule · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer
