LLM-Generated Negative News Headlines Dataset: Creation and Benchmarking Against Real Journalism
Olusola Babalola, Bolanle Ojokoh, and Olutayo Boyinbode

TL;DR
This paper introduces a synthetic dataset of negative news headlines generated by LLMs, validated through expert review and benchmarking, to support NLP tasks while addressing data privacy issues.
Contribution
It presents a novel method for creating and validating a large-scale synthetic negative news headline dataset using LLMs, with comprehensive benchmarking against real headlines.
Findings
Synthetic headlines align well with real headlines in content and tone.
The dataset shows high correlation with real news in embedding space.
Proper noun usage diverges slightly from real headlines.
Abstract
This research examines the potential of datasets generated by Large Language Models (LLMs) to support Natural Language Processing (NLP) tasks, aiming to overcome challenges related to data acquisition and privacy concerns associated with real-world data. Focusing on negative valence text, a critical component of sentiment analysis, we explore the use of LLM-generated synthetic news headlines as an alternative to real-world data. A specialized corpus of negative news headlines was created using tailored prompts to capture diverse negative sentiments across various societal domains. The synthetic headlines were validated by expert review and further analyzed in embedding space to assess their alignment with real-world negative news in terms of content, tone, length, and style. Key metrics such as correlation with real headlines, perplexity, coherence, and realism were evaluated. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining
