Retrieval-Augmented Long-Context Translation for Cultural Image Captioning: Gators submission for AmericasNLP 2026 shared task
Aashish Dhawan, Christopher Driggers-Ellis, Dzmitry Kasinets, Daisy Zhe Wang, Christan Grant

TL;DR
This paper describes a two-stage retrieval-augmented translation pipeline for cultural image captioning in Indigenous languages, achieving significant improvements and winning the shared task.
Contribution
It introduces a novel retrieval-augmented, two-stage translation approach for Indigenous language image captioning, demonstrating substantial performance gains and winning the shared task.
Findings
Retrieval benefits are highly language-dependent and effective mainly with large, in-domain corpora.
Synthetic data augmentation contributes approximately 28 chrF++ points to Guarani performance.
The proposed method outperforms the baseline by over 150% in dev and test evaluations for key languages.
Abstract
We present the University of Florida Gators submission to the AmericasNLP 2026 shared task on cultural image captioning for Indigenous languages. Our two-stage pipeline generates a Spanish intermediate caption with Qwen2.5-VL, then produces the target-language caption using retrieval-augmented many-shot prompting with Gemini 2.5 Flash. We achieve 164.1%, 131.7%, and 122.6% improvements over the shared task baseline for Bribri, Guaran\'i, and Orizaba Nahuatl captioning, respectively, in our dev set evaluation and maintain >150% improvements for the Bribri and Orizaba Nahuatl languages in the test set evaluation. We find retrieval is highly language-dependent, beneficial only for large, in-domain corpora, and that synthetic data augmentation accounts for around 28 chrF++ of the dev set Guaran\'i performance gain. Our submission is the overall winner of the shared task, placing second out…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
