Conversational Question Reformulation via Sequence-to-Sequence   Architectures and Pretrained Language Models

Sheng-Chieh Lin; Jheng-Hong Yang; Rodrigo Nogueira; Ming-Feng Tsai,; Chuan-Ju Wang; Jimmy Lin

arXiv:2004.01909·cs.CL·April 7, 2020·36 cites

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai,, Chuan-Ju Wang, Jimmy Lin

PDF

Open Access

TL;DR

This paper empirically evaluates pretrained language models, especially T5, for conversational question reformulation, demonstrating their effectiveness in improving task-oriented dialogue systems on multiple benchmarks.

Contribution

It introduces an empirical study of PLMs for CQR, highlighting T5's superior performance with fewer parameters across in-domain and out-domain datasets.

Findings

01

T5 achieves the best results on CANARD and CAsT datasets.

02

Pretrained models outperform traditional sequence-to-sequence architectures.

03

Fewer parameters are needed for optimal performance with T5.

Abstract

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs). We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. In CQR benchmarks of task-oriented dialogue systems, we evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task. Examining a variety of architectures with different numbers of parameters, we demonstrate that the recent text-to-text transfer transformer (T5) achieves the best results both on CANARD and CAsT with fewer parameters, compared to similar transformer architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax