Brotherhood at WMT 2024: Leveraging LLM-Generated Contextual Conversations for Cross-Lingual Image Captioning
Siddharth Betala, Ishan Chokshi

TL;DR
This paper introduces a novel cross-lingual image captioning method that uses large language models to generate contextual conversations, improving translation quality without traditional training.
Contribution
The authors propose leveraging instruction-tuned prompting of LLMs to create synthetic conversations for cross-lingual captioning, avoiding traditional training or fine-tuning.
Findings
Achieved 37.90 BLEU on English-Hindi challenge set.
Ranked first and second for English-Hausa on leaderboards.
Explored trade-offs between BLEU scores and semantic similarity.
Abstract
In this paper, we describe our system under the team name Brotherhood for the English-to-Lowres Multi-Modal Translation Task. We participate in the multi-modal translation tasks for English-Hindi, English-Hausa, English-Bengali, and English-Malayalam language pairs. We present a method leveraging multi-modal Large Language Models (LLMs), specifically GPT-4o and Claude 3.5 Sonnet, to enhance cross-lingual image captioning without traditional training or fine-tuning. Our approach utilizes instruction-tuned prompting to generate rich, contextual conversations about cropped images, using their English captions as additional context. These synthetic conversations are then translated into the target languages. Finally, we employ a weighted prompting strategy, balancing the original English caption with the translated conversation to generate captions in the target language. This method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
