Scalable and Domain-General Abstractive Proposition Segmentation
Mohammad Javad Hosseini, Yang Gao, Tim Baumg\"artner, Alex Fabrikant,, Reinald Kim Amplayo

TL;DR
This paper introduces a scalable, supervised approach to abstractive proposition segmentation using fine-tuned large language models, improving accuracy and domain generalization for NLP applications.
Contribution
It presents a supervised training method for proposition segmentation with LLMs, leveraging teacher-student models and synthetic data for improved scalability and domain adaptation.
Findings
Supervised training with annotated datasets improves segmentation accuracy.
Using teacher models to generate synthetic data enables training smaller, effective models.
The approach demonstrates strong domain generalization capabilities.
Abstract
Segmenting text into fine-grained units of meaning is important to a wide range of NLP applications. The default approach of segmenting text into sentences is often insufficient, especially since sentences are usually complex enough to include multiple units of meaning that merit separate treatment in the downstream task. We focus on the task of abstractive proposition segmentation (APS): transforming text into simple, self-contained, well-formed sentences. Several recent works have demonstrated the utility of proposition segmentation with few-shot prompted LLMs for downstream tasks such as retrieval-augmented grounding and fact verification. However, this approach does not scale to large amounts of text and may not always extract all the facts from the input text. In this paper, we first introduce evaluation metrics for the task to measure several dimensions of quality. We then propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗google/gemma-2b-aps-itmodel· 84 dl· ♡ 2084 dl♡ 20
- 🤗google/gemma-7b-aps-itmodel· 958 dl· ♡ 42958 dl♡ 42
- 🤗RichardErkhov/google_-_gemma-2b-aps-it-ggufmodel· 1.0k dl1.0k dl
- 🤗QuantFactory/gemma-2b-aps-it-GGUFmodel· 41 dl· ♡ 141 dl♡ 1
- 🤗QuantFactory/gemma-7b-aps-it-GGUFmodel· 139 dl· ♡ 3139 dl♡ 3
- 🤗RichardErkhov/google_-_gemma-2b-aps-it-4bitsmodel
- 🤗RichardErkhov/google_-_gemma-2b-aps-it-8bitsmodel
- 🤗RichardErkhov/google_-_gemma-7b-aps-it-ggufmodel· 107 dl107 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · AI-based Problem Solving and Planning
MethodsFocus
