# Improving Entity Retrieval on Structured Data

**Authors:** Besnik Fetahu, Ujwal Gadiraju, Stefan Dietze

arXiv: 1703.10349 · 2017-03-31

## TL;DR

This paper presents a two-step entity retrieval method for structured data that leverages clustering and precomputed features to improve retrieval accuracy, especially when explicit links are scarce.

## Contribution

It introduces a novel two-fold approach combining clustering and optimized retrieval to enhance entity retrieval on Linked Data.

## Key findings

- Significant improvement over baseline methods.
- Effective use of clustering for entity expansion.
- Robust performance on BTC12 dataset.

## Abstract

The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1703.10349/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/1703.10349/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/1703.10349/full.md

---
Source: https://tomesphere.com/paper/1703.10349