Efficient Web-based Data Imputation with Graph Model
Yiwen Tang, Hongzhi Wang, Shiwei Zhang, Huijun Zhang, and Ruoxi Shi

TL;DR
This paper presents a graph-based method for web-based data imputation that leverages data dependencies and minimal web access to efficiently fill missing values, outperforming existing approaches.
Contribution
It introduces a novel graph model for dependency representation and two algorithms for keyword generation and value extraction in web-based data imputation.
Findings
The approach efficiently imputes missing data with minimal web access.
Experimental results show superior performance over existing methods.
The method effectively utilizes data dependencies for high-quality imputation.
Abstract
A challenge for data imputation is the lack of knowledge. In this paper, we attempt to address this challenge by involving extra knowledge from web. To achieve high-performance web-based imputation, we use the dependency, i.e.FDs and CFDs, to impute as many as possible values automatically and fill in the other missing values with the minimal access of web, whose cost is relatively large. To make sufficient use of dependencies, We model the dependency set on the data as a graph and perform automatical imputation and keywords generation for web-based imputation based on such graph model. With the generated keywords, we design two algorithms to extract values for imputation from the search results. Extensive experimental results based on real-world data collections show that the proposed approach could impute missing values efficiently and effectively compared to existing approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Management and Algorithms · Data Mining Algorithms and Applications
