Extracting Similar Questions From Naturally-occurring Business Conversations
Xiliang Zhu, David Rossouw, Shayna Gardiner, Simon Corston-Oliver

TL;DR
This paper identifies limitations of standard BERT embeddings in capturing question similarity in business conversations and proposes a tuned representation method with exemplars for better grouping and visualization.
Contribution
It introduces a novel tuning approach and exemplar-based method to improve question similarity detection in real-world business dialogue analysis.
Findings
Standard BERT embeddings have narrow distributions in business contexts.
Tuned representations with exemplars improve question grouping.
Enhanced visualization aids data exploration and coaching.
Abstract
Pre-trained contextualized embedding models such as BERT are a standard building block in many natural language processing systems. We demonstrate that the sentence-level representations produced by some off-the-shelf contextualized embedding models have a narrow distribution in the embedding space, and thus perform poorly for the task of identifying semantically similar questions in real-world English business conversations. We describe a method that uses appropriately tuned representations and a small set of exemplars to group questions of interest to business users in a visualization that can be used for data exploration or employee coaching.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Layer Normalization · Attention Dropout · Adam · WordPiece · Residual Connection
