Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

Seongmin Park; Jinkyu Seo; Jihwa Lee

arXiv:2308.10464·cs.CL·August 22, 2023

Unsupervised Dialogue Topic Segmentation in Hyperdimensional Space

Seongmin Park, Jinkyu Seo, Jihwa Lee

PDF

2 Repos

TL;DR

HyperSeg introduces a hyperdimensional computing method for unsupervised dialogue topic segmentation, outperforming existing methods in accuracy and speed, and enhancing downstream summarization tasks.

Contribution

The paper presents HyperSeg, a novel HDC-based approach that significantly improves unsupervised dialogue topic segmentation performance and efficiency.

Findings

01

Outperforms 4 out of 5 segmentation benchmarks

02

Is 10 times faster than baseline methods

03

Enhances downstream summarization accuracy

Abstract

We present HyperSeg, a hyperdimensional computing (HDC) approach to unsupervised dialogue topic segmentation. HDC is a class of vector symbolic architectures that leverages the probabilistic orthogonality of randomly drawn vectors at extremely high dimensions (typically over 10,000). HDC generates rich token representations through its low-cost initialization of many unrelated vectors. This is especially beneficial in topic segmentation, which often operates as a resource-constrained pre-processing step for downstream transcript understanding tasks. HyperSeg outperforms the current state-of-the-art in 4 out of 5 segmentation benchmarks -- even when baselines are given partial access to the ground truth -- and is 10 times faster on average. We show that HyperSeg also improves downstream summarization accuracy. With HyperSeg, we demonstrate the viability of HDC in a major language task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.