Duration modeling with semi-Markov Conditional Random Fields for keyphrase extraction
Xiaolei Lu, Tommy W.S.Chow

TL;DR
This paper introduces DM-SMCRFs, a novel semi-Markov model that encodes segment-level features and duration information for more effective keyphrase extraction without needing candidate generation or post-processing.
Contribution
The paper proposes DM-SMCRFs, a new duration-aware semi-Markov CRF model that improves keyphrase extraction by modeling segment durations and segment-level features.
Findings
DM-SMCRFs outperform existing methods on multiple datasets.
Model effectively encodes segment duration and features.
Experimental results demonstrate improved keyphrase extraction accuracy.
Abstract
Existing methods for keyphrase extraction need preprocessing to generate candidate phrase or post-processing to transform keyword into keyphrase. In this paper, we propose a novel approach called duration modeling with semi-Markov Conditional Random Fields (DM-SMCRFs) for keyphrase extraction. First of all, based on the property of semi-Markov chain, DM-SMCRFs can encode segment-level features and sequentially classify the phrase in the sentence as keyphrase or non-keyphrase. Second, by assuming the independence between state transition and state duration, DM-SMCRFs model the distribution of duration (length) of keyphrases to further explore state duration information, which can help identify the size of keyphrase. Based on the convexity of parametric duration feature derived from duration distribution, a constrained Viterbi algorithm is derived to improve the performance of decoding in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
