MOSS: End-to-End Dialog System Framework with Modular Supervision
Weixin Liang, Youzhi Tian, Chengcai Chen, Zhou Yu

TL;DR
MOSS is a flexible end-to-end dialog system framework that uses modular supervision from multiple dialog modules to improve performance with limited training data, especially on complex tasks.
Contribution
The paper introduces MOSS, a novel encoder-decoder framework that incorporates supervision from various dialog modules, enhancing data efficiency and adaptability in dialog systems.
Findings
MOSS outperforms state-of-the-art models with only 60% of training data on CamRest676.
MOSS achieves better results with 40% data on a new complex Chinese troubleshooting dataset.
Modular supervision significantly benefits complex dialog tasks with large state and action spaces.
Abstract
A major bottleneck in training end-to-end task-oriented dialog system is the lack of data. To utilize limited training data more efficiently, we propose Modular Supervision Network (MOSS), an encoder-decoder training framework that could incorporate supervision from various intermediate dialog system modules including natural language understanding, dialog state tracking, dialog policy learning, and natural language generation. With only 60% of the training data, MOSS-all (i.e., MOSS with supervision from all four dialog modules) outperforms state-of-the-art models on CamRest676. Moreover, introducing modular supervision has even bigger benefits when the dialog task has a more complex dialog state and action space. With only 40% of the training data, MOSS-all outperforms the state-of-the-art model on a complex laptop network troubleshooting dataset, LaptopNetwork, that we introduced.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
