RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering

Marisa Hudspeth; Patrick J. Burns; Brendan O'Connor

arXiv:2604.20738·cs.CL·April 23, 2026

RespondeoQA: a Benchmark for Bilingual Latin-English Question Answering

Marisa Hudspeth, Patrick J. Burns, Brendan O'Connor

PDF

1 Repo

TL;DR

RespondeoQA introduces a bilingual Latin-English question answering benchmark with 7,800 questions, enabling evaluation of language models in a specialized, culturally rich domain.

Contribution

First Latin-focused QA benchmark, with a diverse dataset and evaluation of large language models, highlighting their limitations in skill and reasoning tasks.

Findings

01

Models perform worse on skill-oriented questions.

02

Reasoning models excel in literary tasks but have limited overall improvement.

03

QwQ performs slightly better on Latin questions, LLaMa3 and o3-mini are more task dependent.

Abstract

We introduce a benchmark dataset for question answering and translation in bilingual Latin and English settings, containing about 7,800 question-answer pairs. The questions are drawn from Latin pedagogical sources, including exams, quizbowl-style trivia, and textbooks ranging from the 1800s to the present. After automated extraction, cleaning, and manual review, the dataset covers a diverse range of question types: knowledge- and skill-based, multihop reasoning, constrained translation, and mixed language pairs. To our knowledge, this is the first QA benchmark centered on Latin. As a case study, we evaluate three large language models -- LLaMa 3, Qwen QwQ, and OpenAI's o3-mini -- finding that all perform worse on skill-oriented questions. Although the reasoning models perform better on scansion and literary-device tasks, they offer limited improvement overall. QwQ performs slightly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

slanglab/RespondeoQA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.