Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding
Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi, Oleksandr Kharytonov, Artur Khodakovskyi

TL;DR
This paper presents a retrieval-augmented system for Ukrainian multi-domain document understanding, leveraging question-aware dense retrieval, reranking, and constrained answer generation to improve accuracy.
Contribution
It introduces a pipeline combining contextual PDF chunking, question-aware reranking, and constrained answer generation using Qwen models, tailored for Ukrainian document comprehension.
Findings
Reranking improves Recall@1 from 0.6957 to 0.7935.
Top-2 reranked passages raise answer accuracy from 0.9348 to 0.9674.
Best run achieved 0.9598 accuracy on the private leaderboard.
Abstract
We participated in the Fifth UNLP shared task on multi-domain document understanding, where systems must answer Ukrainian multiple-choice questions from PDF collections and localize the supporting document and page. We propose a retrieval-augmented pipeline built around three ideas: contextual chunking of PDFs, question-aware dense retrieval and reranking conditioned on both the question and answer options, and constrained answer generation from a small set of reranked passages. Our final system uses Qwen3-Embedding-8B for retrieval, a fine-tuned Qwen3-Reranker-8B for passage ranking, and Qwen3-32B for answer selection. On a held-out split, reranking improves Recall@1 from 0.6957 to 0.7935, while using the top-2 reranked passages raises answer accuracy from 0.9348 to 0.9674. Our best leaderboard run reached 0.9452 on the public leaderboard and 0.9598 on the private leaderboard. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
