A Hierarchical Multi-Agent System for Autonomous Discovery in Geoscientific Data Archives
Dmitrii Pantiukhin, Ivan Kuznetsov, Boris Shapkin, Antonia Anna Jost, Thomas Jung, Nikolay Koldunov

TL;DR
This paper introduces PANGAEA-GPT, a hierarchical multi-agent system that autonomously discovers and analyzes geoscientific data, improving data utilization through complex, multi-step workflows with minimal human input.
Contribution
It presents a novel hierarchical multi-agent architecture with data-type-aware routing and self-correction, enabling autonomous data discovery and analysis in large geoscientific repositories.
Findings
Demonstrated effective autonomous workflows in oceanography and ecology case studies.
Enabled complex multi-step data analysis with minimal human intervention.
Improved data utilization in large geoscientific archives.
Abstract
The rapid accumulation of Earth science data has created a significant scalability challenge; while repositories like PANGAEA host vast collections of datasets, citation metrics indicate that a substantial portion remains underutilized, limiting data reusability. Here we present PANGAEA-GPT, a hierarchical multi-agent framework designed for autonomous data discovery and analysis. Unlike standard Large Language Model (LLM) wrappers, our architecture implements a centralized Supervisor-Worker topology with strict data-type-aware routing, sandboxed deterministic code execution, and self-correction via execution feedback, enabling agents to diagnose and resolve runtime errors. Through use-case scenarios spanning physical oceanography and ecology, we demonstrate the system's capacity to execute complex, multi-step workflows with minimal human intervention. This framework provides a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Environmental Monitoring and Data Management · Research Data Management Practices
