PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Tao Yu; Minghui Zhang; Zhiqing Cui; Hao Wang; Zhongtian Luo; Shenghua Chai; Junhao Gong; Yuzhao Peng; Yuxuan Zhou; Yujia Yang; Zhenghao Zhang; Haopeng Jin; Xinming Wang; Yufei Xiong; Jiabing Yang; Jiahao Yuan; Hanqing Wang; Hongzhu Yi; Yan Huang; Liang Wang

arXiv:2602.03866·cs.DL·February 12, 2026

PaperX: A Unified Framework for Multimodal Academic Presentation Generation with Scholar DAG

Tao Yu, Minghui Zhang, Zhiqing Cui, Hao Wang, Zhongtian Luo, Shenghua Chai, Junhao Gong, Yuzhao Peng, Yuxuan Zhou, Yujia Yang, Zhenghao Zhang, Haopeng Jin, Xinming Wang, Yufei Xiong, Jiabing Yang, Jiahao Yuan, Hanqing Wang, Hongzhu Yi, Yan Huang, Liang Wang

PDF

Open Access

TL;DR

PaperX is a unified framework that transforms scientific papers into multimodal presentations using a novel Scholar DAG representation, improving efficiency and quality over existing isolated methods.

Contribution

It introduces Scholar DAG as an intermediate structure for unified multimodal presentation generation, enabling diverse high-quality outputs from a single source.

Findings

01

Achieves state-of-the-art content fidelity and aesthetic quality.

02

Significantly improves cost efficiency over specialized agents.

03

Supports diverse presentation outputs from one source.

Abstract

Transforming scientific papers into multimodal presentation content is essential for research dissemination but remains labor intensive. Existing automated solutions typically treat each format as an isolated downstream task, leading to redundant processing and semantic inconsistency. We introduce PaperX, a unified framework that models academic presentation generation as a structural transformation and rendering process. Central to our approach is the Scholar DAG, an intermediate representation that decouples the paper's logical structure from its final presentation syntax. By applying adaptive graph traversal strategies, PaperX generates diverse, high quality outputs from a single source. Comprehensive evaluations demonstrate that our framework achieves the state of the art performance in content fidelity and aesthetic quality while significantly improving cost efficiency compared to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Text Analysis Techniques