# ROCKER: A Refinement Operator for Key Discovery

**Authors:** Tommaso Soru, Edgard Marx, Axel-Cyrille Ngonga Ngomo

arXiv: 1705.04380 · 2017-05-15

## TL;DR

ROCKER is a novel refinement operator designed to efficiently discover keys in RDF data, improving accuracy and scalability for large knowledge bases.

## Contribution

The paper introduces ROCKER, a finite, proper, and non-redundant refinement operator for key discovery in RDF data, combining theoretical properties with practical efficiency.

## Key findings

- ROCKER achieves higher accuracy than existing methods.
- The approach scales well to large knowledge bases.
- It maintains comparable runtime with reduced memory consumption.

## Abstract

The Linked Data principles provide a decentral approach for publishing structured data in the RDF format on the Web. In contrast to structured data published in relational databases where a key is often provided explicitly, finding a set of properties that allows identifying a resource uniquely is a non-trivial task. Still, finding keys is of central importance for manifold applications such as resource deduplication, link discovery, logical data compression and data integration. In this paper, we address this research gap by specifying a refinement operator, dubbed ROCKER, which we prove to be finite, proper and non-redundant. We combine the theoretical characteristics of this operator with two monotonicities of keys to obtain a time-efficient approach for detecting keys, i.e., sets of properties that describe resources uniquely. We then utilize a hash index to compute the discriminability score efficiently. Therewith, we ensure that our approach can scale to very large knowledge bases. Results show that ROCKER yields more accurate results, has a comparable runtime, and consumes less memory w.r.t. existing state-of-the-art techniques.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.04380/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/1705.04380/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/1705.04380/full.md

---
Source: https://tomesphere.com/paper/1705.04380