PPTArena: A Benchmark for Agentic PowerPoint Editing

Michael Ofengenden; Yunze Man; Ziqi Pang; Yu-Xiong Wang

arXiv:2512.03042·cs.CV·December 9, 2025

PPTArena: A Benchmark for Agentic PowerPoint Editing

Michael Ofengenden, Yunze Man, Ziqi Pang, Yu-Xiong Wang

PDF

Open Access

TL;DR

PPTArena introduces a comprehensive benchmark for evaluating PowerPoint editing agents on real slide decks, emphasizing in-place modifications guided by natural language, and proposes a structure-aware agent that significantly outperforms existing systems.

Contribution

The paper presents PPTArena, a new benchmark for PowerPoint editing, and introduces PPTPilot, a novel structure-aware editing agent that improves editing accuracy and visual fidelity.

Findings

01

PPTPilot outperforms proprietary agents by over 10 percentage points.

02

The benchmark reveals current agents struggle with long-horizon, document-scale tasks.

03

PPTArena enables evaluation of in-place slide editing under natural language instructions.

Abstract

We introduce PPTArena, a benchmark for PowerPoint editing that measures reliable modifications to real slides under natural-language instructions. In contrast to image-PDF renderings or text-to-slide generation, PPTArena focuses on in-place editing across 100 decks, 2125 slides, and over 800 targeted edits covering text, charts, tables, animations, and master-level styles. Each case includes a ground-truth deck, a fully specified target outcome, and a dual VLM-as-judge pipeline that separately scores instruction following and visual quality using both structural diffs and slide images. Building on this setting, we propose PPTPilot, a structure-aware slide-editing agent that plans semantic edit sequences, routes between high-level programmatic tools and deterministic XML operations for precise control, and verifies outputs through an iterative plan-edit-check loop against task-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Multimodal Machine Learning Applications · Data Visualization and Analytics