Best Practices for Data-Efficient Modeling in NLG:How to Train   Production-Ready Neural Models with Less Data

Ankit Arun; Soumya Batra; Vikas Bhardwaj; Ashwini Challa; Pinar; Donmez; Peyman Heidari; Hakan Inan; Shashank Jain; Anuj Kumar; Shawn Mei,; Karthik Mohan; Michael White

arXiv:2011.03877·cs.CL·November 10, 2020

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Ankit Arun, Soumya Batra, Vikas Bhardwaj, Ashwini Challa, Pinar, Donmez, Peyman Heidari, Hakan Inan, Shashank Jain, Anuj Kumar, Shawn Mei,, Karthik Mohan, Michael White

PDF

TL;DR

This paper presents practical techniques and best practices for training small, data-efficient neural language generation models suitable for production in conversational systems, addressing challenges like high data requirements and latency.

Contribution

It introduces a family of sampling and modeling techniques that enable deployment of small, efficient neural NLG models with limited data, along with a comprehensive set of best practices.

Findings

01

Domain complexity influences the choice of data-efficient approach.

02

Small models (2MB) can achieve production quality with limited data.

03

The techniques enable reliable deployment of neural NLG in production environments.

Abstract

Natural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions. Although neural network solutions recently developed in the research community have been shown to provide several benefits, deployment of such model-based solutions has been challenging due to high latency, correctness issues, and high data needs. In this paper, we present approaches that have helped us deploy data-efficient neural solutions for NLG in conversational systems to production. We describe a family of sampling and modeling techniques to attain production quality with light-weight neural network models using only a fraction of the data that would be necessary otherwise, and show a thorough comparison between each. Our results show that domain…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.