Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline
Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob, Goldberger, Mohit Bansal, Ido Dagan

TL;DR
This paper introduces a supervised, proposition-level alignment task for summarization, creating new datasets and a baseline model that outperforms heuristic methods in alignment accuracy.
Contribution
It formalizes alignment as a supervised classification task at the proposition level and provides new datasets and a baseline model for this task.
Findings
Supervised model improves alignment quality over unsupervised methods.
New datasets enable better training and evaluation of alignment models.
Proposition-level alignment enhances summarization training data quality.
Abstract
Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
