The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes

Vladimir Baulin; Austin Cook; Daniel Friedman; Janna Lumiruusu; Andrew Pashea; Shagor Rahman; and Benedikt Waldeck

arXiv:2505.17500·cond-mat.soft·May 26, 2025

The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes

Vladimir Baulin, Austin Cook, Daniel Friedman, Janna Lumiruusu, Andrew Pashea, Shagor Rahman, and Benedikt Waldeck

PDF

TL;DR

The paper presents the Discovery Engine, a framework that transforms scientific literature into a structured, tensor-based knowledge representation enabling AI-driven exploration, hypothesis generation, and accelerated scientific discovery.

Contribution

It introduces a novel AI-driven framework that converts disconnected publications into a unified, tensor-based knowledge landscape for enhanced scientific exploration and discovery.

Findings

01

Creates a high-dimensional Conceptual Tensor for scientific knowledge

02

Enables dynamic exploration through interpretable views like knowledge graphs

03

Supports AI agents in identifying connections and gaps in literature

Abstract

The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address these challenges by transforming an array of disconnected literature into a unified, computationally tractable representation of a scientific domain. Central to our approach is the LLM-driven distillation of publications into structured "knowledge artifacts," instances of a universal conceptual schema, complete with verifiable links to source evidence. These artifacts are then encoded into a high-dimensional Conceptual Tensor. This tensor serves as the primary, compressed representation of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.