Unraveling the Capabilities of Language Models in News Summarization
Abdurrahman Odaba\c{s}{\i}, G\"oksel Biricik

TL;DR
This paper benchmarks 20 recent language models for news summarization, revealing that larger models like GPT-3.5-Turbo and GPT-4 excel, while some smaller models also show promising results, with demonstration examples not always improving performance.
Contribution
It provides a comprehensive evaluation of various language models in zero-shot and few-shot settings for news summarization, highlighting the impact of reference quality and model capabilities.
Findings
GPT-3.5-Turbo and GPT-4 outperform others in summarization quality.
Few-shot demonstrations sometimes worsen results due to poor reference summaries.
Certain smaller models like Qwen1.5-7B show competitive performance.
Abstract
Given the recent introduction of multiple language models and the ongoing demand for improved Natural Language Processing tasks, particularly summarization, this work provides a comprehensive benchmarking of 20 recent language models, focusing on smaller ones for the news summarization task. In this work, we systematically test the capabilities and effectiveness of these models in summarizing news article texts which are written in different styles and presented in three distinct datasets. Specifically, we focus in this study on zero-shot and few-shot learning settings and we apply a robust evaluation methodology that combines different evaluation concepts including automatic metrics, human evaluation, and LLM-as-a-judge. Interestingly, including demonstration examples in the few-shot learning setting did not enhance models' performance and, in some cases, even led to worse quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Label Smoothing · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Transformer · Attention Dropout · Linear Layer · Dense Connections
