Measuring Large Language Models Capacity to Annotate Journalistic   Sourcing

Subramaniam Vincent; Phoebe Wang; Zhan Shi; Sahas Koka; Yi Fang

arXiv:2501.00164·cs.CL·April 4, 2025

Measuring Large Language Models Capacity to Annotate Journalistic Sourcing

Subramaniam Vincent, Phoebe Wang, Zhan Shi, Sahas Koka, Yi Fang

PDF

Open Access 1 Datasets

TL;DR

This paper proposes a benchmark to evaluate large language models' ability to identify and annotate sourcing in journalistic stories, highlighting current limitations and the need for improved transparency in AI-assisted journalism.

Contribution

It introduces a novel scenario, dataset, and metrics for assessing LLMs on sourcing annotation in journalism, addressing a gap in existing benchmarks.

Findings

01

LLMs currently struggle to identify all sourced statements.

02

Matching source types remains challenging for LLMs.

03

Spotting source justifications is particularly difficult.

Abstract

Since the launch of ChatGPT in late 2022, the capacities of Large Language Models and their evaluation have been in constant discussion and evaluation both in academic research and in the industry. Scenarios and benchmarks have been developed in several areas such as law, medicine and math (Bommasani et al., 2023) and there is continuous evaluation of model variants. One area that has not received sufficient scenario development attention is journalism, and in particular journalistic sourcing and ethics. Journalism is a crucial truth-determination function in democracy (Vincent, 2023), and sourcing is a crucial pillar to all original journalistic output. Evaluating the capacities of LLMs to annotate stories for the different signals of sourcing and how reporters justify them is a crucial scenario that warrants a benchmark approach. It offers potential to build automated systems to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

subbuvincent/llms-journ-sourcing
dataset· 118 dl
118 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational and Text Analysis Methods

MethodsSoftmax · Attention Is All You Need