DAGverse: Building Document-Grounded Semantic DAGs from Scientific Papers
Shu Wan, Saketh Vishnubhatla, Iskander Kushbay, Tom Heffernan, Aaron Belikoff, Raha Moraffah, Huan Liu

TL;DR
DAGverse is a framework that constructs semantic directed acyclic graphs from scientific papers, leveraging figures and text to enable structured reasoning grounded in real-world evidence.
Contribution
We introduce DAGverse, a semi-automatic system for extracting high-quality semantic DAGs from scientific documents, including a new dataset and methods outperforming existing models.
Findings
DAGverse-Pipeline achieves high precision in DAG construction.
The dataset DAGverse-1 contains 108 expert-validated DAGs.
Our system outperforms existing Vision-Language Models in DAG classification.
Abstract
Directed Acyclic Graphs (DAGs) are widely used to represent structured knowledge in scientific and technical domains. However, datasets for real-world DAGs remain scarce because constructing them typically requires expert interpretation of domain documents. We study Doc2SemDAG construction: recovering a preferred semantic DAG from a document together with the cited evidence and context that explain it. This problem is challenging because a document may admit multiple plausible abstractions, the intended structure is often implicit, and the supporting evidence is scattered across prose, equations, captions, and figures. To address these challenges, we leverage scientific papers containing explicit DAG figures as a natural source of supervision. In this setting, the DAG figure provides the DAG structure, while the accompanying text provides context and explanation. We introduce DAGverse,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Multimodal Machine Learning Applications
