Segmentation en phrases : ouvrez les guillemets sans perdre le fil
Sandrine Ollinger (ATILF), Denis Maurel

TL;DR
This paper introduces a graph cascade method for sentence segmentation in XML documents, effectively handling nested sentences, quotations, and lists, with performance evaluation against 2019 benchmarks.
Contribution
The paper proposes a novel graph cascade approach for complex sentence segmentation, addressing nested structures and punctuation challenges in XML documents.
Findings
Effective segmentation of nested sentences and quotations
Improved performance over 2019 benchmarks
Robust handling of parentheses and colons
Abstract
This paper presents a graph cascade for sentence segmentation of XML documents. Our proposal offers sentences inside sentences for cases introduced by quotation marks and hyphens, and also pays particular attention to situations involving incises introduced by parentheses and lists introduced by colons. We present how the tool works and compare the results obtained with those available in 2019 on the same dataset, together with an evaluation of the system's performance on a test corpus
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics and Discourse Analysis
MethodsSoftmax · Attention Is All You Need
