AdaTyper: Adaptive Semantic Column Type Detection
Madelon Hulsebos, Paul Groth, \c{C}a\u{g}atay Demiralp

TL;DR
AdaTyper is a novel adaptive system that improves semantic column type detection in relational tables by using weak supervision and minimal human feedback, effectively handling new types and data shifts.
Contribution
It introduces AdaTyper, a hybrid predictor that adapts to new semantic types and data distributions with minimal supervision, addressing deployment challenges in table understanding.
Findings
F1-score improves with minimal examples
Approaches 0.6 precision after 5 examples
Outperforms existing adaptation methods
Abstract
Understanding the semantics of relational tables is instrumental for automation in data exploration and preparation systems. A key source for understanding a table is the semantics of its columns. With the rise of deep learning, learned table representations are now available, which can be applied for semantic type detection and achieve good performance on benchmarks. Nevertheless, we observe a gap between this performance and its applicability in practice. In this paper, we propose AdaTyper to address one of the most critical deployment challenges: adaptation. AdaTyper uses weak-supervision to adapt a hybrid type predictor towards new semantic types and shifted data distributions at inference time, using minimal human feedback. The hybrid type predictor of AdaTyper combines rule-based methods and a light machine learning model for semantic column type detection. We evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data-Driven Disease Surveillance · Time Series Analysis and Forecasting
