Multi-Modal Data Exploration via Language Agents
Farhad Nooralahzadeh, Yi Zhang, Jonathan Furst, Kurt Stockinger

TL;DR
This paper introduces M$^2$EX, a system that uses language agents and large language models to enable natural language querying across structured databases and unstructured data like text and images, improving multi-modal data exploration.
Contribution
The paper presents a novel LLM-based framework that decomposes complex multi-modal queries into subtasks and orchestrates modality-specific experts for efficient data exploration.
Findings
Outperforms state-of-the-art multi-modal exploration systems in accuracy.
Reduces query latency and API costs.
Enhances reasoning capabilities for multi-modal data querying.
Abstract
International enterprises, organizations, and hospitals collect large amounts of multi-modal data stored in databases, text documents, images, and videos. While there has been recent progress in the separate fields of multi-modal data exploration as well as in database systems that automatically translate natural language questions to database query languages, the research challenge of querying both structured databases and unstructured modalities (e.g., texts, images) in natural language remains largely unexplored. In this paper, we propose MEX -a system that enables multi-modal data exploration via language agents. Our approach is based on the following research contributions: (1) Our system is inspired by a real-world use case that enables users to explore multi-modal information systems. (2) MEX leverages an LLM-based agentic AI framework to decompose a natural language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling
