Position: Foundation Models for Tabular Data within Systemic Contexts Need Grounding
Tassilo Klein, Johannes Hoffart

TL;DR
This paper emphasizes the importance of grounding foundation models for tabular data in operational context, introducing Semantically Linked Tables and a new training paradigm to improve their practical understanding and application.
Contribution
It proposes a novel model class, FMSLT, with dual-phase training and introduces the Operational Turing Test benchmark for evaluating operational grounding.
Findings
FMSLT effectively incorporates operational context into tabular data models.
Dual-phase training improves zero-shot inference on proprietary data.
Operational grounding is crucial for autonomous data-driven agents.
Abstract
This position paper argues that foundation models for tabular data face inherent limitations when isolated from operational context - the procedural logic, declarative rules, and domain knowledge that define how data is created and governed. Current approaches focus on single-table generalization or schema-level relationships, fundamentally missing the operational knowledge that gives data meaning. We introduce Semantically Linked Tables (SLT) and Foundation Models for SLT (FMSLT) as a new model class that grounds tabular data in its operational context. We propose dual-phase training: pre-training on open-source code-data pairs and synthetic systems to learn business logic mechanics, followed by zero-shot inference on proprietary data. We introduce the ``Operational Turing Test'' benchmark and argue that operational grounding is essential for autonomous agents in complex data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Systems and Decision Making
