Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity

Qiyao Wei; Edward Morrell; Lea Goetz; Mihaela van der Schaar

arXiv:2511.19925·cs.AI·November 26, 2025

Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity

Qiyao Wei, Edward Morrell, Lea Goetz, Mihaela van der Schaar

PDF

Open Access 1 Video

TL;DR

This paper introduces a knowledge graph-based method to generate domain-specific benchmarks for evaluating semantic similarity in LLM outputs, addressing limitations of existing methods and revealing domain and variation impacts.

Contribution

The paper presents a novel KG-based approach for creating semantic similarity benchmarks across multiple domains, reducing reliance on subjective human judgment.

Findings

01

Semantic variation sub-types affect similarity method performance

02

Domain influences the effectiveness of similarity measures

03

No single method outperforms others across all settings

Abstract

Evaluating the open-form textual responses generated by Large Language Models (LLMs) typically requires measuring the semantic similarity of the response to a (human generated) reference. However, there is evidence that current semantic similarity methods may capture syntactic or lexical forms over semantic content. While benchmarks exist for semantic equivalence, they often suffer from high generation costs due to reliance on subjective human judgment, limited availability for domain-specific applications, and unclear definitions of equivalence. This paper introduces a novel method for generating benchmarks to evaluate semantic similarity methods for LLM outputs, specifically addressing these limitations. Our approach leverages knowledge graphs (KGs) to generate pairs of natural-language statements that are semantically similar or dissimilar, with dissimilar pairs categorized into one…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Semantic-KG: Using Knowledge Graphs to Construct Benchmarks for Measuring Semantic Similarity· slideslive

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Artificial Intelligence in Healthcare and Education