CAPITU: A Benchmark for Evaluating Instruction-Following in Brazilian Portuguese with Literary Context
Giovana Kerche Bon\'as, Roseval Malaquias Junior, Marcos Piau, Thiago Laitz, Thales Sales Almeida, Hugo Abonizio, Celio Larcher, Ramon Pires, Rodrigo Nogueira

TL;DR
CAPITU is a benchmark designed to evaluate the instruction-following abilities of Large Language Models in Brazilian Portuguese, using culturally-grounded literary tasks and automatic verification methods.
Contribution
It introduces a novel benchmark with culturally-contextualized, verifiable tasks in Portuguese, including diverse linguistic and structural constraints, and provides comprehensive evaluation of state-of-the-art models.
Findings
High accuracy of reasoning models like GPT-5.2 (98.5%)
Portuguese-specialized models offer cost-efficient performance
Multi-turn evaluation shows significant variation in constraint persistence
Abstract
We introduce CAPITU, a benchmark for evaluating instruction-following capabilities of Large Language Models (LLMs) in Brazilian Portuguese. Unlike existing benchmarks that focus on English or use generic prompts, CAPITU contextualizes all tasks within eight canonical works of Brazilian literature, combining verifiable instruction constraints with culturally-grounded content. The benchmark comprises 59 instruction types organized into seven categories, all designed to be automatically verifiable without requiring LLM judges or human evaluation. Instruction types include Portuguese-specific linguistic constraints (word termination patterns like -ando/-endo/-indo, -inho/-inha, -mente) and structural requirements. We evaluate 18 state-of-the-art models across single-turn and multi-turn settings. Our results show that frontier reasoning models achieve strong performance (GPT-5.2 with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling
