THIVLVC: Retrieval Augmented Dependency Parsing for Latin
Luc Pommeret (STL), Thibault Wagret (ENS de Lyon, HiSoMA), Jules Deret

TL;DR
THIVLVC is a two-stage retrieval-augmented system that improves Latin dependency parsing by leveraging similar sentence examples and large language models, significantly enhancing accuracy on poetry texts.
Contribution
It introduces a retrieval-augmented approach for Latin dependency parsing, combining sentence similarity retrieval with LLM refinement, achieving notable accuracy improvements.
Findings
+17 CLAS points on poetry (Seneca) over baseline
+1.5 CLAS points on prose (Thomas Aquinas)
Error analysis shows 53.3% of divergences favor THIVLVC among unanimous decisions.
Abstract
We describe THIVLVC, a two-stage system for the EvaLatin 2026 Dependency Parsing task. Given a Latin sentence, we retrieve structurally similar entries from the CIRCSE treebank using sentence length and POS n-gram similarity, then prompt a large language model to refine the baseline parse from UDPipe using the retrieved examples and UD annotation guidelines. We submit two configurations: one without retrieval and one with retrieval (RAG). On poetry (Seneca), THIVLVC improves CLAS by +17 points over the UDPipe baseline; on prose (Thomas Aquinas), the gain is +1.5 CLAS. A double-blind error analysis of 300 divergences between our system and the gold standard reveals that, among unanimous annotator decisions, 53.3% favour THIVLVC, showing annotation inconsistencies both within and across treebanks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
