Auctus: A Dataset Search Engine for Data Augmentation
Sonia Castelo, R\'emi Rampin, A\'ecio Santos, Aline Bessa, Fernando, Chirigati, and Juliana Freire

TL;DR
Auctus is a dataset search engine designed to facilitate discovery of structured data from various sources, supporting data augmentation and analytics to enhance machine learning and scientific research.
Contribution
The paper introduces Auctus, a novel dataset search engine that addresses challenges in discovering structured data and demonstrates its utility through case studies.
Findings
Auctus effectively supports dataset discovery with a rich query interface.
It enhances machine learning models through data augmentation.
Case studies show improved analytics with Auctus.
Abstract
The large volumes of structured data currently available, from Web tables to open-data portals and enterprise data, open up new opportunities for progress in answering many important scientific, societal, and business questions. However, finding relevant data is difficult. While search engines have addressed this problem for Web documents, there are many new challenges involved in supporting the discovery of structured data. We demonstrate how the Auctus dataset search engine addresses some of these challenges. We describe the system architecture and how users can explore datasets through a rich set of queries. We also present case studies which show how Auctus supports data augmentation to improve machine learning models as well as to enrich analytics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Stream Mining Techniques · Web Data Mining and Analysis
