Explaining Mixtures of Sources in News Articles
Alexander Spangher, James Youn, Matt DeButts, Nanyun Peng, Emilio, Ferrara, Jonathan May

TL;DR
This paper investigates how journalists select sources for news articles by modeling planning schemas, developing metrics to identify underlying source-selection plans, and demonstrating the ability to predict these schemas from headlines, aiding long-form AI generation.
Contribution
It introduces a framework for understanding journalistic source plans using adapted schemas, and develops metrics to predict underlying plans from article headlines.
Findings
Stance and social affiliation schemas best explain source plans.
Textual entailment schema explains plans in factual topics.
Schema prediction from headlines is reasonably accurate.
Abstract
Human writers plan, then write. For large language models (LLMs) to play a role in longer-form article generation, we must understand the planning steps humans make before writing. We explore one kind of planning, source-selection in news, as a case-study for evaluating plans in long-form generation. We ask: why do specific stories call for specific kinds of sources? We imagine a generative process for story writing where a source-selection schema is first selected by a journalist, and then sources are chosen based on categories in that schema. Learning the article's plan means predicting the schema initially chosen by the journalist. Working with professional journalists, we adapt five existing schemata and introduce three new ones to describe journalistic plans for the inclusion of sources in documents. Then, inspired by Bayesian latent-variable modeling, we develop metrics to select…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Media Influence and Politics · Hate Speech and Cyberbullying Detection
