Recent Trends in Linear Text Segmentation: a Survey
Iacopo Ghinassi, Lin Wang, Chris Newell, Matthew Purver

TL;DR
This survey reviews recent advances in linear text segmentation, highlighting new methods, resources, limitations, and future research directions in the context of increasing web-based multimedia content.
Contribution
It provides an extensive overview of current state-of-the-art approaches, resources, and challenges in linear text segmentation, and suggests future research directions.
Findings
Current methods improve segmentation accuracy
Resources are limited and need expansion
Identifies under-explored research areas
Abstract
Linear Text Segmentation is the task of automatically tagging text documents with topic shifts, i.e. the places in the text where the topics change. A well-established area of research in Natural Language Processing, drawing from well-understood concepts in linguistic and computational linguistic research, the field has recently seen a lot of interest as a result of the surge of text, video, and audio available on the web, which in turn require ways of summarising and categorizing the mole of content for which linear text segmentation is a fundamental step. In this survey, we provide an extensive overview of current advances in linear text segmentation, describing the state of the art in terms of resources and approaches for the task. Finally, we highlight the limitations of available resources and of the task itself, while indicating ways forward based on the most recent literature and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Text and Document Classification Technologies · Topic Modeling
