Neural Sequence Segmentation as Determining the Leftmost Segments
Yangming Li, Lemao Liu, Kaisheng Yao

TL;DR
This paper introduces a novel neural framework for sentence segmentation that incrementally identifies leftmost segments, effectively capturing long-term dependencies and outperforming previous token-level methods in syntactic chunking and POS tagging.
Contribution
It proposes a new segment-level segmentation framework using LSTM-minus and RNN, advancing beyond token-level methods to better model long-term dependencies.
Findings
Outperforms all baselines in syntactic chunking and POS tagging
Achieves new state-of-the-art results on three datasets
Effectively models long-term dependencies in long sentences
Abstract
Prior methods to text segmentation are mostly at token level. Despite the adequacy, this nature limits their full potential to capture the long-term dependencies among segments. In this work, we propose a novel framework that incrementally segments natural language sentences at segment level. For every step in segmentation, it recognizes the leftmost segment of the remaining sequence. Implementations involve LSTM-minus technique to construct the phrase representations and recurrent neural networks (RNN) to model the iterations of determining the leftmost segments. We have conducted extensive experiments on syntactic chunking and Chinese part-of-speech (POS) tagging across 3 datasets, demonstrating that our methods have significantly outperformed previous all baselines and achieved new state-of-the-art results. Moreover, qualitative analysis and the study on segmenting long-length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
