Exploiting Representation Bias for Data Distillation in Abstractive Text Summarization
Yash Kumar Atri, Vikram Goyal, Tanmoy Chakraborty

TL;DR
This paper investigates the limitations of current abstractive summarization models in capturing input diversity, proposing a data filtering method based on clustering to improve model robustness and summary quality.
Contribution
It introduces a novel approach to analyze and filter training data by discretizing model representations, enhancing diversity and faithfulness in summarization models.
Findings
Filtering redundant data improves model robustness.
Discretizing representations reveals limited diversity capture.
Data filtering enhances summary quality metrics.
Abstract
Abstractive text summarization is surging with the number of training samples to cater to the needs of the deep learning models. These models tend to exploit the training data representations to attain superior performance by improving the quantitative element of the resultant summary. However, increasing the size of the training set may not always be the ideal solution to maximize the performance, and therefore, a need to revisit the quality of training samples and the learning protocol of deep learning models is a must. In this paper, we aim to discretize the vector space of the abstractive text summarization models to understand the characteristics learned between the input embedding space and the models' encoder space. We show that deep models fail to capture the diversity of the input space. Further, the distribution of data points on the encoder space indicates that an unchecked…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsSparse Evolutionary Training · Focus
