Local Topology Measures of Contextual Language Model Latent Spaces With Applications to Dialogue Term Extraction
Benjamin Matthias Ruppik, Michael Heck, Carel van Niekerk, Renato, Vukovic, Hsien-chin Lin, Shutong Feng, Marcus Zibrowius, Milica Ga\v{s}i\'c

TL;DR
This paper introduces local topology measures of language model latent spaces to improve dialogue term extraction, addressing limitations of traditional sequence tagging methods by leveraging the structure of embedding spaces.
Contribution
It proposes novel complexity measures of local topology in embedding spaces and demonstrates their effectiveness in dialogue term extraction tasks.
Findings
Local topology features improve dialogue term extraction accuracy.
Embedding space structure reveals semantic properties.
Method reduces reliance on fine-tuning models.
Abstract
A common approach for sequence tagging tasks based on contextual word representations is to train a machine learning classifier directly on these embedding vectors. This approach has two shortcomings. First, such methods consider single input sequences in isolation and are unable to put an individual embedding vector in relation to vectors outside the current local context of use. Second, the high performance of these models relies on fine-tuning the embedding model in conjunction with the classifier, which may not always be feasible due to the size or inaccessibility of the underlying feature-generation model. It is thus desirable, given a collection of embedding vectors of a corpus, i.e., a datastore, to find features of each vector that describe its relation to other, similar vectors in the datastore. With this in mind, we introduce complexity measures of the local topology of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Natural Language Processing Techniques
