Multi-Modal Data Exploration via Language Agents

Farhad Nooralahzadeh; Yi Zhang; Jonathan Furst; Kurt Stockinger

arXiv:2412.18428·cs.AI·November 26, 2025

Multi-Modal Data Exploration via Language Agents

Farhad Nooralahzadeh, Yi Zhang, Jonathan Furst, Kurt Stockinger

PDF

Open Access 1 Repo

TL;DR

This paper introduces M$^2$EX, a system that uses language agents and large language models to enable natural language querying across structured databases and unstructured data like text and images, improving multi-modal data exploration.

Contribution

The paper presents a novel LLM-based framework that decomposes complex multi-modal queries into subtasks and orchestrates modality-specific experts for efficient data exploration.

Findings

01

Outperforms state-of-the-art multi-modal exploration systems in accuracy.

02

Reduces query latency and API costs.

03

Enhances reasoning capabilities for multi-modal data querying.

Abstract

International enterprises, organizations, and hospitals collect large amounts of multi-modal data stored in databases, text documents, images, and videos. While there has been recent progress in the separate fields of multi-modal data exploration as well as in database systems that automatically translate natural language questions to database query languages, the research challenge of querying both structured databases and unstructured modalities (e.g., texts, images) in natural language remains largely unexplored. In this paper, we propose M $^{2}$ EX -a system that enables multi-modal data exploration via language agents. Our approach is based on the following research contributions: (1) Our system is inspired by a real-world use case that enables users to explore multi-modal information systems. (2) M $^{2}$ EX leverages an LLM-based agentic AI framework to decompose a natural language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yizhang-unifr/xmode
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Topic Modeling