MARCA: A Checklist-Based Benchmark for Multilingual Web Search

Thales Sales Almeida; Giovana Kerche Bon\'as; Ramon Pires; Celio Larcher; Hugo Abonizio; Marcos Piau; Roseval Malaquias Junior; Rodrigo Nogueira; Thiago Laitz

arXiv:2604.14448·cs.CL·April 17, 2026

MARCA: A Checklist-Based Benchmark for Multilingual Web Search

Thales Sales Almeida, Giovana Kerche Bon\'as, Ramon Pires, Celio Larcher, Hugo Abonizio, Marcos Piau, Roseval Malaquias Junior, Rodrigo Nogueira, Thiago Laitz

PDF

1 Repo

TL;DR

MARCA is a bilingual benchmark for evaluating multilingual web search capabilities of LLMs, focusing on answer completeness and correctness in English and Portuguese.

Contribution

It introduces a new multilingual benchmark with manual questions and rubrics, and evaluates models in different interaction frameworks to analyze performance variability.

Findings

01

Orchestration improves answer coverage in models.

02

Large performance differences observed among models.

03

Significant variability in English to Portuguese transferability.

Abstract

Large language models (LLMs) are increasingly used as sources of information, yet their reliability depends on the ability to search the web, select relevant evidence, and synthesize complete answers. While recent benchmarks evaluate web-browsing and agentic tool use, multilingual settings, and Portuguese in particular, remain underexplored. We present \textsc{MARCA}, a bilingual (English and Portuguese) benchmark for evaluating LLMs on web-based information seeking. \textsc{MARCA} consists of 52 manually authored multi-entity questions, paired with manually validated checklist-style rubrics that explicitly measure answer completeness and correctness. We evaluate 14 models under two interaction settings: a Basic framework with direct web search and scraping, and an Orchestrator framework that enables task decomposition via delegated subagents. To capture stochasticity, each question is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

maritaca-ai/MARCA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.