FOCAL: A Novel Benchmarking Technique for Multi-modal Agents

Anupam Purwar; Aditya Choudhary

arXiv:2601.07367·cs.SD·March 3, 2026

FOCAL: A Novel Benchmarking Technique for Multi-modal Agents

Anupam Purwar, Aditya Choudhary

PDF

Open Access

TL;DR

FOCAL is a new benchmarking framework designed to evaluate multi-modal voice and text agents, focusing on reasoning, error propagation, and conversation quality, with novel metrics for assessing agent efficacy.

Contribution

It introduces FOCAL, a comprehensive benchmarking framework with new metrics for analyzing reasoning and semantic quality in multi-modal agents.

Findings

01

Effective end-to-end reasoning evaluation

02

Component-wise error analysis capabilities

03

Novel Reasoning and Semantic scores for conversation quality

Abstract

With the recent advancements in reasoning capabilities, tool calling using MCP servers and Audio Language Models (ALMs), development and integration of multi-modal agents (with voice and text support) has come to the industry forefront. Cascading pipelines for voice agents still play a central role in the industry owing to their superior reasoning capabilities facilitated by LLMs. Although, cascading pipelines often present error propagation through the pipeline. We propose a framework, FOCAL to benchmark end-to-end reasoning, component-wise error propagation and error analysis for automated as well as human-assisted testing of multi-modal agents (voice to voice + text input). We also share two novel metrics viz. Reasoning and Semantic scores to evaluate efficacy of the agent in having meaningful conversations in voice mode.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Speech Recognition and Synthesis