A Multimodal Conversational Agent for Tabular Data Analysis
Mohammad Nour Al Awad, Sergey Ivanov, Olga Tikhonova, Ivan Khodnenko

TL;DR
This paper introduces Talk2Data, a multimodal conversational agent leveraging large language models for intuitive, interactive data analysis through voice and text, enabling users to explore datasets with high accuracy and efficiency.
Contribution
The paper presents a novel multimodal LLM-driven system that integrates speech recognition, code generation, and text-to-speech to facilitate interactive data exploration across modalities.
Findings
Achieved 95.8% accuracy on 48 tasks across three datasets.
Model-only generation time under 1.7 seconds.
7B model offers optimal accuracy-latency-cost balance.
Abstract
Large language models (LLMs) can reshape information processing by handling data analysis, visualization, and interpretation in an interactive, context-aware dialogue with users, including voice interaction, while maintaining high performance. In this article, we present Talk2Data, a multimodal LLM-driven conversational agent for intuitive data exploration. The system lets users query datasets with voice or text instructions and receive answers as plots, tables, statistics, or spoken explanations. Built on LLMs, the suggested design combines OpenAI Whisper automatic speech recognition (ASR) system, Qwen-coder code generation LLM/model, custom sandboxed execution tools, and Coqui library for text-to-speech (TTS) within an agentic orchestration loop. Unlike text-only analysis tools, it adapts responses across modalities and supports multi-turn dialogues grounded in dataset context. In an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · AI in Service Interactions
