TL;DR
This study explores human decision-making in table unionability, develops a machine learning framework to assist humans, and compares human and language model performance, aiming to enhance data discovery processes.
Contribution
It provides the first comprehensive analysis of human behavior in table unionability and introduces a machine learning approach to augment human decision-making.
Findings
Humans and LLMs perform comparably in unionability tasks.
Combining human judgment with machine learning improves accuracy.
Human decision patterns reveal key features for unionability.
Abstract
Data discovery and table unionability in particular became key tasks in modern Data Science. However, the human perspective for these tasks is still under-explored. Thus, this research investigates the human behavior in determining table unionability within data discovery. We have designed an experimental survey and conducted a comprehensive analysis, in which we assess human decision-making for table unionability. We use the observations from the analysis to develop a machine learning framework to boost the (raw) performance of humans. Furthermore, we perform a preliminary study on how LLM performance is compared to humans indicating that it is typically better to consider a combination of both. We believe that this work lays the foundations for developing future Human-in-the-Loop systems for efficient data discovery.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
