# ViSQA: A benchmark dataset and baseline models for Vietnamese spoken question answering

**Authors:** Le Trong Minh, Nguyen Duc Thinh, Nguyen Khanh Tho Loc, Le Van Quan, Ngo Duc Tam, Le Hoang Son

PMC · DOI: 10.1371/journal.pone.0340771 · PLOS One · 2026-01-12

## TL;DR

ViSQA is the first benchmark for Vietnamese spoken question answering, enabling evaluation of models under varying transcription quality.

## Contribution

ViSQA introduces the first standardized Vietnamese SQA benchmark with over 13,000 question-answer pairs and audio variants.

## Key findings

- ASR errors significantly degrade model performance, e.g., ViT5 EM drops from 62.04% to 36.30%.
- Training on spoken transcriptions improves robustness, raising ViT5 EM from 36.30% to 50.70%.
- ViSQA enables systematic analysis of ASR error impact on downstream reasoning.

## Abstract

Spoken Question Answering (SQA) extends machine reading comprehension to spoken content and requires models to handle both automatic speech recognition (ASR) errors and downstream language understanding. Although large-scale SQA benchmarks exist for high-resource languages, Vietnamese remains underexplored due to the lack of standardized datasets. This paper introduces ViSQA, the first benchmark for Vietnamese Spoken Question Answering. ViSQA extends the UIT-ViQuAD corpus using a reproducible text-to-speech and ASR pipeline, resulting in over 13,000 question–answer pairs aligned with spoken inputs. The dataset includes clean and noise-degraded audio variants to enable systematic evaluation under varying transcription quality. Experiments with five transformer-based models show that ASR errors substantially degrade performance (e.g., ViT5 EM: 62.04% → 36.30%), while training on spoken transcriptions improves robustness (ViT5 EM: 36.30% → 50.70%). ViSQA provides a rigorous benchmark for evaluating Vietnamese SQA systems and enables systematic analysis of the impact of ASR errors on downstream reasoning.

## Full-text entities

- **Diseases:** SQA (MESH:C538270), NORMALIZE (MESH:C537354), MRC (MESH:D001308), XLM-R (MESH:C580424)
- **Chemicals:** LibriSQA (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12795348/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12795348/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/PMC12795348/full.md

---
Source: https://tomesphere.com/paper/PMC12795348