Unified Data Discovery across Query Modalities and User Intents
Tingting Wang, Shixun Huang, Zhifeng Bao, J. Shane Culpepper, Shazia Sadiq, Volkan Dedeoglu, Reza Arablouei

TL;DR
UniDisc is a unified framework for data discovery that supports multiple query modalities and user intents, leveraging graph-based representations to improve retrieval across diverse scenarios.
Contribution
It introduces a cross-modal, intent-agnostic data discovery model that learns from limited supervision using heterogeneous graph representations.
Findings
UniDisc outperforms strong baselines on seven datasets.
Supports both natural language and table queries.
Generalizes across diverse user intents without intent-specific tuning.
Abstract
Data discovery - retrieving relevant tables from a data lake in response to user queries - is a fundamental building block for downstream analytics. In practice, data discovery must support different query modalities, including natural language (NL) statements and tables, and accommodate diverse user intents, ranging from open-ended enrichment to task-driven inference for applications such as table question answering and fact verification. However, most existing methods are designed for a single query modality or a specific user intent, limiting their generalizability. We propose UniDisc, a unified data discovery framework that supports both NL statements and tables as queries and generalizes across diverse user intents without intent-specific representations or relevance modeling. UniDisc learns a common cross-modal representation model that produces unified representations for queries…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
