Rule-Based Approaches to Atomic Sentence Extraction
Lineesha Kamana, Akshita Ananda Subramanian, Mehuli Ghosh, and Suman Saha

TL;DR
This paper analyzes how complex sentence structures impact the accuracy of rule-based atomic sentence extraction, highlighting challenges posed by various syntactic forms and providing insights into improving interpretability.
Contribution
It offers a principled analysis of syntactic factors affecting rule-based extraction performance, using dependency rules and standard datasets to identify specific structural challenges.
Findings
Rule-based extraction achieves moderate accuracy on complex sentences.
Relative clauses and passive constructions are particularly challenging.
Syntactic complexity significantly influences extraction success.
Abstract
Natural language often combines multiple ideas into complex sentences. Atomic sentence extraction, the task of decomposing complex sentences into simpler sentences that each express a single idea, improves performance in information retrieval, question answering, and automated reasoning systems. Previous work has formalized the "split-and-rephrase" task and established evaluation metrics, and machine learning approaches using large language models have improved extraction accuracy. However, these methods lack interpretability and provide limited insight into which linguistic structures cause extraction failures. Although some studies have explored dependency-based extraction of subject-verb-object triples and clauses, no principled analysis has examined which specific clause structures and dependencies lead to extraction difficulties. This study addresses this gap by analyzing how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Topic Modeling · Natural Language Processing Techniques
