Online Topic-Aware Entity Resolution Over Incomplete Data Streams (Technical Report)
Weilong Ren, Xiang Lian, Kambiz Ghazinour

TL;DR
This paper introduces a novel online method for entity resolution over incomplete data streams, incorporating topic-awareness, data imputation, and efficient indexing to improve accuracy and performance in real-time applications.
Contribution
The paper formulates the TER-iDS problem and proposes a new online algorithm with imputation, pruning, and indexing strategies for effective topic-aware entity resolution over incomplete streams.
Findings
The proposed method effectively imputes missing data in real-time.
It achieves high accuracy in identifying matching entities.
The approach demonstrates efficiency on real datasets.
Abstract
In many real applications such as the data integration, social network analysis, and the Semantic Web, the entity resolution (ER) is an important and fundamental problem, which identifies and links the same real-world entities from various data sources. While prior works usually consider ER over static and complete data, in practice, application data are usually collected in a streaming fashion, and often incur missing attributes (due to the inaccuracy of data extraction techniques). Therefore, in this paper, we will formulate and tackle a novel problem, topic-aware entity resolution over incomplete data streams (TER-iDS), which online imputes incomplete tuples and detects pairs of topic-related matching entities from incomplete data streams. In order to effectively and efficiently tackle the TER-iDS problem, we propose an effective imputation strategy, carefully design effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Data Mining Algorithms and Applications
