TL;DR
UCPhrase is an unsupervised, context-aware phrase tagging method that leverages transformer attention maps and silver labels derived from co-occurring words, effectively identifying emerging and domain-specific quality phrases without reliance on extensive labeled data.
Contribution
The paper introduces UCPhrase, a novel unsupervised approach combining silver labels and transformer attention to improve quality phrase detection in various contexts, especially for rare and emerging phrases.
Findings
Outperforms state-of-the-art unsupervised methods in phrase ranking and extraction.
Effectively captures emerging and out-of-KB phrases.
Demonstrates robustness across multiple datasets and tasks.
Abstract
Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
