# NEWSAGENT: Benchmarking Multimodal Agents as Journalists with Real-World Newswriting Tasks

**Authors:** Yen-Che Chien, Kuang-Da Wang, Wei-Yao Wang, Wen-Chih Peng

arXiv: 2509.00446 · 2025-09-03

## TL;DR

NEWSAGENT is a benchmark designed to evaluate multimodal agents' ability to perform real-world newswriting tasks, including data retrieval, content selection, and article generation, highlighting current capabilities and challenges.

## Contribution

The paper introduces NEWSAGENT, a novel benchmark with real news data to assess multimodal agents' performance in journalistic tasks involving information discovery and content creation.

## Key findings

- Agents can retrieve relevant facts from multimodal data.
- Agents struggle with planning and narrative integration.
- NEWSAGENT provides a realistic testbed for agent evaluation.

## Abstract

Recent advances in autonomous digital agents from industry (e.g., Manus AI and Gemini's research mode) highlight potential for structured tasks by autonomous decision-making and task decomposition; however, it remains unclear to what extent the agent-based systems can improve multimodal web data productivity. We study this in the realm of journalism, which requires iterative planning, interpretation, and contextual reasoning from multimodal raw contents to form a well structured news. We introduce NEWSAGENT, a benchmark for evaluating how agents can automatically search available raw contents, select desired information, and edit and rephrase to form a news article by accessing core journalistic functions. Given a writing instruction and firsthand data as how a journalist initiates a news draft, agents are tasked to identify narrative perspectives, issue keyword-based queries, retrieve historical background, and generate complete articles. Unlike typical summarization or retrieval tasks, essential context is not directly available and must be actively discovered, reflecting the information gaps faced in real-world news writing. NEWSAGENT includes 6k human-verified examples derived from real news, with multimodal contents converted to text for broad model compatibility. We evaluate open- and closed-sourced LLMs with commonly-used agentic frameworks on NEWSAGENT, which shows that agents are capable of retrieving relevant facts but struggling with planning and narrative integration. We believe that NEWSAGENT serves a realistic testbed for iterating and evaluating agent capabilities in terms of multimodal web data manipulation to real-world productivity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2509.00446/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2509.00446/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/2509.00446/full.md

---
Source: https://tomesphere.com/paper/2509.00446