Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering

Julius Gun; Timo Oksanen

arXiv:2508.18093·cs.CL·March 9, 2026

Agri-Query: A Case Study on RAG vs. Long-Context LLMs for Cross-Lingual Technical Question Answering

Julius Gun, Timo Oksanen

PDF

TL;DR

This study evaluates large language models with 128K-token contexts on cross-lingual technical question answering using agricultural manuals, comparing direct prompting and RAG strategies, and highlights the superior performance of hybrid RAG methods.

Contribution

It provides a detailed benchmark and analysis of long-context LLMs and RAG strategies in a specialized industrial domain for cross-lingual QA tasks.

Findings

01

Hybrid RAG outperforms direct prompting consistently.

02

Models like Gemini 2.5 Flash and Qwen 2.5 7B achieve over 85% accuracy with RAG.

03

The framework enables practical evaluation of LLMs in domain-specific scenarios.

Abstract

We present a case study evaluating large language models (LLMs) with 128K-token context windows on a technical question answering (QA) task. Our benchmark is built on a user manual for an agricultural machine, available in English, French, and German. It simulates a cross-lingual information retrieval scenario where questions are posed in English against all three language versions of the manual. The evaluation focuses on realistic "needle-in-a-haystack" challenges and includes unanswerable questions to test for hallucinations. We compare nine long-context LLMs using direct prompting against three Retrieval-Augmented Generation (RAG) strategies (keyword, semantic, hybrid), with an LLM-as-a-judge for evaluation. Our findings for this specific manual show that Hybrid RAG consistently outperforms direct long-context prompting. Models like Gemini 2.5 Flash and the smaller Qwen 2.5 7B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.