Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking   Across Diverse Vocabularies

Sai Koneru; Matthias Huck; Miriam Exel; Jan Niehues

arXiv:2408.11327·cs.CL·November 5, 2024

Plug, Play, and Fuse: Zero-Shot Joint Decoding via Word-Level Re-ranking Across Diverse Vocabularies

Sai Koneru, Matthias Huck, Miriam Exel, Jan Niehues

PDF

Open Access

TL;DR

This paper introduces a zero-shot re-ranking method that combines diverse models during decoding at the word level, enabling multimodal translation without additional training and improving translation quality.

Contribution

It presents a novel word-level re-ranking strategy for joint decoding of models with different vocabularies in a zero-shot setting, enhancing multimodal translation capabilities.

Findings

01

Improves translation quality in multimodal scenarios

02

Enables integration of models with different vocabularies

03

Operates without additional training or fine-tuning

Abstract

Recent advancements in NLP have resulted in models with specialized strengths, such as processing multimodal inputs or excelling in specific domains. However, real-world tasks, like multimodal translation, often require a combination of these strengths, such as handling both translation and image processing. While individual translation and vision models are powerful, they typically lack the ability to perform both tasks in a single system. Combining these models poses challenges, particularly due to differences in their vocabularies, which limit the effectiveness of traditional ensemble methods to post-generation techniques like N-best list re-ranking. In this work, we propose a novel zero-shot ensembling strategy that allows for the integration of different models during the decoding phase without the need for additional training. Our approach re-ranks beams during decoding by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification