The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent
Muhammad Imam Luthfi Balaka, Raul Castro Fernandez

TL;DR
The Pneuma Project presents Pneuma-Seeker, a system that uses language models to help users articulate, discover, and prepare data aligned with their evolving information needs through iterative, structured interactions.
Contribution
It introduces a novel approach combining relational data modeling, context specialization, and dynamic planning to automate data discovery and preparation workflows.
Findings
Helps surface latent user intent effectively.
Guides discovery to produce fit-for-purpose documents.
Captures institutional knowledge as emergent documentation.
Abstract
Data discovery and preparation remain persistent bottlenecks in the data management lifecycle, especially when user intent is vague, evolving, or difficult to operationalize. The Pneuma Project introduces Pneuma-Seeker, a system that helps users articulate and fulfill information needs through iterative interaction with a language model-powered platform. The system reifies the user's evolving information need as a relational data model and incrementally converges toward a usable document aligned with that intent. To achieve this, the system combines three architectural ideas: context specialization to reduce LLM burden across subtasks, a conductor-style planner to assemble dynamic execution plans, and a convergence mechanism based on shared state. The system integrates recent advances in retrieval-augmented generation (RAG), agentic frameworks, and structured data preparation to support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Scientific Computing and Data Management · Personal Information Management and User Behavior
