A Pre-training Framework for Relational Data with Information-theoretic Principles
Quang Truong, Zhikai Chen, Mingxuan Ju, Tong Zhao, Neil Shah, Jiliang Tang

TL;DR
This paper introduces Task Vector Estimation (TVE), a pre-training framework for relational databases that leverages information-theoretic principles to create task-aware representations, improving performance on diverse downstream tasks.
Contribution
The paper proposes TVE, a novel pre-training method that models relational dynamics and task heterogeneity using set-based aggregation and information theory, advancing relational data learning.
Findings
TVE outperforms traditional pre-training baselines on RelBench.
Task-informed representations retain more relevant signals.
Encoding task heterogeneity improves predictive modeling on relational databases.
Abstract
Relational databases underpin critical infrastructure across a wide range of domains, yet the design of generalizable pre-training strategies for learning from relational databases remains an open challenge due to task heterogeneity. Specifically, there exist many possible downstream tasks, as tasks are defined based on relational schema graphs, temporal dependencies, and SQL-defined label logics. An effective pre-training framework is desired to take these factors into account in order to obtain task-aware representations. By incorporating knowledge of the underlying distribution that drives label generation, downstream tasks can benefit from relevant side-channel information. To bridge this gap, we introduce Task Vector Estimation (TVE), a novel pre-training framework that constructs predictive supervisory signals via set-based aggregation over schema traversal graphs, explicitly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Semantic Web and Ontologies
