Interactive Data Integration through Smart Copy & Paste
Zachary Ives (University Of Pennsylvania), Craig Knoblock (University, of Southern California - Information Sciences Institute), Steve Minton (Fetch, Technologies), Marie Jacob (University of Pennsylvania), Partha Talukdar, (University of Pennsylvania)

TL;DR
This paper introduces CopyCat, a system that simplifies data integration by allowing users to perform extraction, schema creation, and mapping through intuitive copy-and-paste actions, with system suggestions and learning from user feedback.
Contribution
The paper presents a novel smart copy-and-paste model and prototype system for interactive data integration, unifying design-time and run-time processes without specialized tools.
Findings
Prototype system demonstrates effective data integration via copy-and-paste
System provides auto-completion suggestions with provenance explanations
Learns from user feedback to improve integration suggestions
Abstract
In many scenarios, such as emergency response or ad hoc collaboration, it is critical to reduce the overhead in integrating data. Ideally, one could perform the entire process interactively under one unified interface: defining extractors and wrappers for sources, creating a mediated schema, and adding schema mappings ? while seeing how these impact the integrated view of the data, and refining the design accordingly. We propose a novel smart copy and paste (SCP) model and architecture for seamlessly combining the design-time and run-time aspects of data integration, and we describe an initial prototype, the CopyCat system. In CopyCat, the user does not need special tools for the different stages of integration: instead, the system watches as the user copies data from applications (including the Web browser) and pastes them into CopyCat?s spreadsheet-like workspace. CopyCat…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Database Systems and Queries
