HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature

Devvrat Joshi; Islem Rekik

arXiv:2603.23136·cs.CL·March 25, 2026

HGNet: Scalable Foundation Model for Automated Knowledge Graph Generation from Scientific Literature

Devvrat Joshi, Islem Rekik

PDF

Open Access 3 Reviews

TL;DR

This paper introduces HGNet, a scalable framework for automated knowledge graph construction from scientific literature, combining novel entity and relation extraction methods with hierarchical consistency enforcement, achieving state-of-the-art results.

Contribution

The paper presents a two-stage framework with innovative semantic decomposition and hierarchy-aware relation extraction, formalizing hierarchical abstraction as a continuous property in Euclidean space.

Findings

01

Improves NER by 8.08% and RE by 5.99% on out-of-distribution tests.

02

Achieves 10.76% gain in NER and 26.2% in RE in zero-shot settings.

03

Establishes a new benchmark SPHERE for hierarchical relation extraction.

Abstract

Automated knowledge graph (KG) construction is essential for navigating the rapidly expanding body of scientific literature. However, existing approaches struggle to recognize long multi-word entities, often fail to generalize across domains, and typically overlook the hierarchical nature of scientific knowledge. While general-purpose large language models (LLMs) offer adaptability, they are computationally expensive and yield inconsistent accuracy on specialized tasks. As a result, current KGs are shallow and inconsistent, limiting their utility for exploration and synthesis. We propose a two-stage framework for scalable, zero-shot scientific KG construction. The first stage, Z-NERD, introduces (i) Orthogonal Semantic Decomposition (OSD), which promotes domain-agnostic entity recognition by isolating semantic "turns" in text, and (ii) a Multi-Scale TCQK attention mechanism that…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 2Confidence 3

Strengths

- The proposed ideas are well presented in isolation, e.g. the several hypotheses that the authors present to justify their system design seem reasonable. - The ideas seem original in the area of scientific KG generation to me, and some are likely to be influential, especially the NER system and the use of the differentiable loss to ensure a logically consistent KG

Weaknesses

- The architecture of the system is not sufficiently described in the main body of the paper to understand or reproduce the system. E.g. sections 3.1 and 3.2 that provide the methodologies for the NER and RE do not mention a specific architecture; rather they mention losses and modifications to attention heads. One must read the appendix to begin to understand the system.

Reviewer 02Rating 8Confidence 3

Strengths

Clear and appropriate model design to resolve the specific challenges in KG construction from literature data Solid experimental evaluation with reliable improvement.

Weaknesses

None

Reviewer 03Rating 4Confidence 4

Strengths

- TCQK introduces architectural inductive bias for multi‑word entities; CAF imposes a simple, interpretable Euclidean ordering of abstraction; DHL neatly encodes DAG and anti‑shortcut constraints. - Solid ablations (Tables 1–3) and complexity notes for DHL - Clean factorization of challenges (entity coherence, domain generalization, hierarchy, global consistency) and matching components. - Consistent gains on SciERC/SciER and large zero‑shot improvements on SPHERE

Weaknesses

SPHERE is generated and self‑annotated by LLMs; may encode stylistic biases, inflated structure, or task leakage that favors the proposed inductive biases. Though Limited human validation is described. (App. A.3.) - Large LLMs are evaluated “as is” (several OOM), and there is no strong hyperbolic/order‑embedding baseline for hierarchy—weakening the “simpler & better than hyperbolic” claim. - CAF relies on anchors and topological depths—procedure unclear for standard datasets; OSD’s learning o

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Graph Neural Networks · Topic Modeling · Machine Learning in Healthcare