Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs
Yifan Wei, Xiaoyan Yu, Tengfei Pan, Angsheng Li, Li Du

TL;DR
The paper introduces SENATOR, a framework that uses structural entropy and Monte Carlo Tree Search to identify and generate targeted synthetic data, improving LLMs' knowledge in specialized domains.
Contribution
It presents a novel Structural Entropy-guided approach combined with MCTS to detect and repair knowledge gaps in LLMs through targeted data generation.
Findings
Enhanced performance on domain-specific benchmarks
Effective detection and repair of knowledge deficiencies
Improved factual accuracy in LLM outputs
Abstract
Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora, yet their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research, where high factual precision is required. While synthetic data provides a promising avenue for augmenting domain knowledge, existing methods frequently generate redundant samples that do not align with the model's true knowledge gaps. To overcome this limitation, we propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs. Our approach employs the Structure Entropy (SE) metric to quantify uncertainty along knowledge graph paths and leverages Monte Carlo Tree Search (MCTS) to selectively explore regions where the model lacks domain-specific knowledge. Guided by these insights, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Machine Learning in Healthcare
MethodsALIGN
