Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral
Ant\'onio Farinhas, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei,, Andr\'e F.T. Martins

TL;DR
This paper introduces a cascaded machine translation system that uses quality estimation metrics to selectively defer difficult instances to larger models, achieving high performance with reduced computational costs.
Contribution
It proposes a simple, QE-based deferral method for cascaded translation systems, improving efficiency while maintaining translation quality.
Findings
QE-based deferral matches larger model performance
Reduces model invocation to 30-50% of cases
Validated through automatic and human evaluations
Abstract
Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques
