On Uncertainty Calibration and Selective Generation in Probabilistic Neural Summarization: A Benchmark Study
Polina Zablotskaia, Du Phan, Joshua Maynez, Shashi Narayan, Jie Ren,, Jeremiah Liu

TL;DR
This study evaluates various probabilistic deep learning methods for neural summarization, demonstrating their ability to improve uncertainty calibration and selective generation, while highlighting their limitations across diverse benchmarks.
Contribution
It provides a comprehensive benchmark analysis of probabilistic methods in neural summarization, revealing their strengths and failure modes in uncertainty calibration and selective abstention.
Findings
Probabilistic methods improve uncertainty calibration and summary quality.
They enhance selective generation by abstaining from low-quality outputs.
Certain methods like Deep Ensemble and Monte Carlo Dropout have notable failure patterns.
Abstract
Modern deep models for summarization attains impressive benchmark performance, but they are prone to generating miscalibrated predictive uncertainty. This means that they assign high confidence to low-quality predictions, leading to compromised reliability and trustworthiness in real-world applications. Probabilistic deep learning methods are common solutions to the miscalibration problem. However, their relative effectiveness in complex autoregressive summarization tasks are not well-understood. In this work, we thoroughly investigate different state-of-the-art probabilistic methods' effectiveness in improving the uncertainty quality of the neural summarization models, across three large-scale benchmarks with varying difficulty. We show that the probabilistic methods consistently improve the model's generation and uncertainty quality, leading to improved selective generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
