TL;DR
The paper introduces TaDSE, a template-aware contrastive learning method that leverages token-level template information to produce high-quality dialogue sentence embeddings, improving performance on multiple benchmarks.
Contribution
It presents a novel augmentation technique using template information and a synthetic dataset to enhance dialogue sentence embedding learning.
Findings
TaDSE outperforms previous SOTA methods on five dialogue benchmarks.
The synthetic dataset diversifies utterance-template associations effectively.
A new semantic compression test reveals a correlation with embedding uniformity and alignment.
Abstract
Learning high quality sentence embeddings from dialogues has drawn increasing attentions as it is essential to solve a variety of dialogue-oriented tasks with low annotation cost. Annotating and gathering utterance relationships in conversations are difficult, while token-level annotations, \eg, entities, slots and templates, are much easier to obtain. Other sentence embedding methods are usually sentence-level self-supervised frameworks and cannot utilize token-level extra knowledge. We introduce Template-aware Dialogue Sentence Embedding (TaDSE), a novel augmentation method that utilizes template information to learn utterance embeddings via self-supervised contrastive learning framework. We further enhance the effect with a synthetically augmented dataset that diversifies utterance-template association, in which slot-filling is a preliminary step. We evaluate TaDSE performance on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
