# Enhancing Semantic Document Retrieval- Employing Group Steiner Tree Algorithm with Domain Knowledge Enrichment

**Authors:** Apurva Kulkarni, Chandrashekar Ramanathan, and Vinu E Venugopal

arXiv: 2508.20543 · 2025-08-29

## TL;DR

This paper introduces a novel semantic document retrieval algorithm that integrates domain knowledge using a Group Steiner Tree approach, significantly improving retrieval precision and accuracy in real-world scenarios.

## Contribution

The paper presents a new algorithm combining Group Steiner Tree with domain knowledge for enhanced semantic document retrieval, validated through real-world data and expert evaluation.

## Key findings

- Achieved 90% precision in retrieval tasks.
- Attained 82% accuracy in system performance.
- Demonstrated substantial improvements over baseline systems.

## Abstract

Retrieving pertinent documents from various data sources with diverse characteristics poses a significant challenge for Document Retrieval Systems. The complexity of this challenge is further compounded when accounting for the semantic relationship between data and domain knowledge. While existing retrieval systems using semantics (usually represented as Knowledge Graphs created from open-access resources and generic domain knowledge) hold promise in delivering relevant outcomes, their precision may be compromised due to the absence of domain-specific information and reliance on outdated knowledge sources. In this research, the primary focus is on two key contributions- a) the development of a versatile algorithm- 'Semantic-based Concept Retrieval using Group Steiner Tree' that incorporates domain information to enhance semantic-aware knowledge representation and data access, and b) the practical implementation of the proposed algorithm within a document retrieval system using real-world data. To assess the effectiveness of the SemDR system, research work conducts performance evaluations using a benchmark consisting of 170 real-world search queries. Rigorous evaluation and verification by domain experts are conducted to ensure the validity and accuracy of the results. The experimental findings demonstrate substantial advancements when compared to the baseline systems, with precision and accuracy achieving levels of 90% and 82% respectively, signifying promising improvements.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20543/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20543/full.md

## References

58 references — full list in the complete paper: https://tomesphere.com/paper/2508.20543/full.md

---
Source: https://tomesphere.com/paper/2508.20543