Enhancing Knowledge Graph Construction: Evaluating with Emphasis on Hallucination, Omission, and Graph Similarity Metrics
Hussam Ghanem (ICB, UB), Christophe Cruz (ICB, UB)

TL;DR
This paper evaluates knowledge graph construction from text using advanced metrics, focusing on hallucination and omission issues, and demonstrates that fine-tuning improves accuracy but may affect generalization.
Contribution
It introduces a refined evaluation framework with BERTScore and thresholding, and compares original and fine-tuned models to improve knowledge graph quality.
Findings
Fine-tuning enhances graph accuracy and reduces hallucination and omission.
Graph similarity measured with BERTScore at 95% threshold improves evaluation.
Fine-tuned models show decreased generalization on new datasets.
Abstract
Recent advancements in large language models have demonstrated significant potential in the automated construction of knowledge graphs from unstructured text. This paper builds upon our previous work [16], which evaluated various models using metrics like precision, recall, F1 score, triple matching, and graph matching, and introduces a refined approach to address the critical issues of hallucination and omission. We propose an enhanced evaluation framework incorporating BERTScore for graph similarity, setting a practical threshold of 95% for graph matching. Our experiments focus on the Mistral model, comparing its original and fine-tuned versions in zero-shot and few-shot settings. We further extend our experiments using examples from the KELM-sub training dataset, illustrating that the fine-tuned model significantly improves knowledge graph construction accuracy while reducing the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
