Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase   Mining

Eyal Orbach; Lev Haikin; Nelly David; Avi Faizakof

arXiv:2405.07263·cs.CL·May 14, 2024

Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining

Eyal Orbach, Lev Haikin, Nelly David, Avi Faizakof

PDF

Open Access

TL;DR

This paper proposes a novel approach to phrase mining using contextualized word embeddings that can be aggregated for arbitrary spans, improving retrieval effectiveness in noisy contexts without significant computational overhead.

Contribution

It introduces a modification to contrastive loss enabling embeddings to represent meaningful spans, enhancing phrase retrieval in noisy, real-world scenarios.

Findings

01

Improved phrase retrieval accuracy on the new dataset.

02

Effective span representations with minimal additional computation.

03

Demonstrated benefits over traditional sentence-level embeddings.

Abstract

Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques