CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

Yexing Du; Kaiyuan Liu; Youcheng Pan; Zheng Chu; Bo Yang; Xiaocheng Feng; Ming Liu; Yang Xiang

arXiv:2508.07295·cs.CL·January 28, 2026

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation

Yexing Du, Kaiyuan Liu, Youcheng Pan, Zheng Chu, Bo Yang, Xiaocheng Feng, Ming Liu, Yang Xiang

PDF

Open Access 1 Datasets 1 Video

TL;DR

The paper introduces CCFQA, a comprehensive benchmark for evaluating the factuality of multilingual, cross-modal large language models, highlighting current challenges and proposing a few-shot transfer learning method to improve speech question answering.

Contribution

It presents the CCFQA benchmark for multilingual, cross-modal factuality evaluation and a few-shot transfer learning approach to enhance spoken question answering in MLLMs.

Findings

01

Current MLLMs struggle with the CCFQA benchmark.

02

Few-shot transfer learning improves multilingual spoken QA performance.

03

The benchmark promotes development of more reliable speech understanding in MLLMs.

Abstract

As Large Language Models (LLMs) are increasingly popularized in the multilingual world, ensuring hallucination-free factuality becomes markedly crucial. However, existing benchmarks for evaluating the reliability of Multimodal Large Language Models (MLLMs) predominantly focus on textual or visual modalities with a primary emphasis on English, which creates a gap in evaluation when processing multilingual input, especially in speech. To bridge this gap, we propose a novel Cross-lingual and Cross-modal Factuality benchmark (CCFQA). Specifically, the CCFQA benchmark contains parallel speech-text factual questions across 8 languages, designed to systematically evaluate MLLMs' cross-lingual and cross-modal factuality capabilities. Our experimental results demonstrate that current MLLMs still face substantial challenges on the CCFQA benchmark. Furthermore, we propose a few-shot transfer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

yxdu/ccfqa
dataset· 36 dl
36 dl

Videos

CCFQA: A Benchmark for Cross-Lingual and Cross-Modal Speech and Text Factuality Evaluation· underline

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning