Building Markovian Generative Architectures over Pretrained LM Backbones   for Efficient Task-Oriented Dialog Systems

Hong Liu; Yucheng Cai; Zhijian Ou; Yi Huang; Junlan Feng

arXiv:2204.06452·cs.CL·October 17, 2022·5 cites

Building Markovian Generative Architectures over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems

Hong Liu, Yucheng Cai, Zhijian Ou, Yi Huang, Junlan Feng

PDF

Open Access 2 Repos

TL;DR

This paper introduces Markovian architectures over pretrained language models for task-oriented dialog systems, improving efficiency and training in low-resource scenarios by leveraging the Markov property of dialog states.

Contribution

It proposes a novel Markovian generative architecture that reduces memory and computation costs, especially effective in low-resource training settings.

Findings

01

Reduces memory and time costs in rich-resource settings.

02

Improves training efficiency in low-resource scenarios.

03

Maintains performance comparable to non-Markov models.

Abstract

Recently, Transformer based pretrained language models (PLMs), such as GPT2 and T5, have been leveraged to build generative task-oriented dialog (TOD) systems. A drawback of existing PLM-based models is their non-Markov architectures across turns, i.e., the whole history is used as the conditioning input at each turn. First, this brings inefficiencies in memory and computation. Furthermore, using the whole history increases model complexity and may hurt the training efficiency, especially when facing small amounts of labeled training data (the low-resource setting). In this paper, motivated by the observation that dialog states could be viewed as Markov states, we propose to build Markovian Generative Architectures (MGA) over PLM backbones for efficient TOD systems. Experiments on MultiWOZ2.1 show that in the rich-resource setting, the proposed Markov models reduce memory and time costs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Speech Recognition and Synthesis

MethodsAttention Is All You Need · Linear Layer · SentencePiece · Refunds@Expedia|||How do I get a full refund from Expedia? · Gated Linear Unit · Adafactor · Inverse Square Root Schedule · Absolute Position Encodings · Byte Pair Encoding · Position-Wise Feed-Forward Layer