Octopus: A Lightweight Entity-Aware System for Multi-Table Data Discovery and Cell-Level Retrieval

Wen-Zhi Li; Sainyam Galhotra

arXiv:2601.02304·cs.DB·January 6, 2026

Octopus: A Lightweight Entity-Aware System for Multi-Table Data Discovery and Cell-Level Retrieval

Wen-Zhi Li, Sainyam Galhotra

PDF

Open Access

TL;DR

Octopus is a lightweight, entity-aware system that improves multi-table data discovery and cell-level retrieval by using an LLM parser for entity identification and a compact index, avoiding heavy offline preprocessing.

Contribution

It introduces a training-free, entity-aware approach for multi-table data discovery and cell retrieval that outperforms existing systems in accuracy and efficiency.

Findings

01

Outperforms existing systems in multi-table discovery tasks.

02

Achieves lower computational and token costs.

03

Supports both independent and join-based discovery.

Abstract

Tabular data constitute a dominant form of information in modern data lakes and repositories, yet discovering the relevant tables to answer user questions remains challenging. Existing data discovery systems assume that each question can be answered by a single table and often rely on resource-intensive offline preprocessing, such as model training or large-scale content indexing. In practice, however, many questions require information spread across multiple tables -- either independently or through joins -- and users often seek specific cell values rather than entire tables. In this paper, we present Octopus, a lightweight, entity-aware, and training-free system for multi-table data discovery and cell-level value retrieval. Instead of embedding entire questions, Octopus identifies fine-grained entities (column mentions and value mentions) from natural-language queries using an LLM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Web Data Mining and Analysis · Advanced Database Systems and Queries