Does Dialog Length matter for Next Response Selection task? An Empirical Study
Jatin Ganhotra, Sachindra Joshi

TL;DR
This study empirically investigates how dialog length affects BERT's performance in Next Response Selection tasks, finding minimal impact and that simple truncation often suffices, despite BERT's sequence length limitations.
Contribution
The paper provides the first empirical analysis of dialog length effects on BERT-based dialog models, highlighting that truncation is effective despite BERT's sequence length constraints.
Findings
Long dialogs have little impact on BERT performance.
Simple truncation of dialogs works effectively.
BERT's limitations on sequence length are less critical for dialog tasks.
Abstract
In the last few years, the release of BERT, a multilingual transformer based model, has taken the NLP community by storm. BERT-based models have achieved state-of-the-art results on various NLP tasks, including dialog tasks. One of the limitation of BERT is the lack of ability to handle long text sequence. By default, BERT has a maximum wordpiece token sequence length of 512. Recently, there has been renewed interest to tackle the BERT limitation to handle long text sequences with the addition of new self-attention based architectures. However, there has been little to no research on the impact of this limitation with respect to dialog tasks. Dialog tasks are inherently different from other NLP tasks due to: a) the presence of multiple utterances from multiple speakers, which may be interlinked to each other across different turns and b) longer length of dialogs. In this work, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Advanced Text Analysis Techniques
MethodsLinear Layer · Layer Normalization · Residual Connection · Attention Dropout · Attention Is All You Need · Dense Connections · Adam · Linear Warmup With Linear Decay · Dropout · Softmax
