WebRelate: Integrating Web Data with Spreadsheets using Examples
Jeevana Priya Inala, Rishabh Singh

TL;DR
WebRelate is a system that simplifies web data integration with spreadsheets by learning to generate URLs and extract data using minimal input-output examples, making web data joining accessible to non-programmers.
Contribution
WebRelate introduces a novel approach that decomposes web data integration into URL learning and data extraction, using expressive languages and efficient synthesis from few examples.
Findings
Learns data extraction programs within seconds for most tasks
Requires only one example for the majority of real-world tasks
Successfully applied to 88 web data integration tasks
Abstract
Data integration between web sources and relational data is a key challenge faced by data scientists and spreadsheet users. There are two main challenges in programmatically joining web data with relational data. First, most websites do not expose a direct interface to obtain tabular data, so the user needs to formulate a logic to get to different webpages for each input row in the relational table. Second, after reaching the desired webpage, the user needs to write complex scripts to extract the relevant data, which is often conditioned on the input data. Since many data scientists and end-users come from diverse backgrounds, writing such complex regular-expression based logical scripts to perform data integration tasks is unfortunately often beyond their programming expertise. We present WebRelate, a system that allows users to join semi-structured web data with relational data in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
