Limitation Learning: Catching Adverse Dialog with GAIL
Noah Kasmanoff, Rahul Zalkikar

TL;DR
This paper applies imitation learning to dialogue systems, using a discriminator to identify limitations and adverse behaviors in conversational models, which can improve safety and robustness.
Contribution
It introduces a novel application of imitation learning and discriminator-based analysis to detect adverse behaviors in dialog models.
Findings
Discriminator effectively classifies expert vs. synthetic conversations.
Policy can generate coherent responses given a prompt.
Discriminator reveals limitations of current dialog models.
Abstract
Imitation learning is a proven method for creating a policy in the absence of rewards, by leveraging expert demonstrations. In this work, we apply imitation learning to conversation. In doing so, we recover a policy capable of talking to a user given a prompt (input state), and a discriminator capable of classifying between expert and synthetic conversation. While our policy is effective, we recover results from our discriminator that indicate the limitations of dialog models. We argue that this technique can be used to identify adverse behavior of arbitrary data models common for dialog oriented tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Topic Modeling · Natural Language Processing Techniques
