Visualizing Uncertainty in Translation Tasks: An Evaluation of LLM Performance and Confidence Metrics
Jin Hyun Park, Utsawb Laminchhane, Umer Farooq, Uma Sivakumar, Arpan, Kumar

TL;DR
This paper introduces novel uncertainty metrics and a visualization tool for LLM-based translation, improving interpretability and trust by providing token-level confidence insights and a clear visual representation of translation uncertainties.
Contribution
It presents three new uncertainty quantification metrics and a web-based visualization tool, enhancing interpretability of LLM translation outputs compared to prior methods.
Findings
UQ metrics correlate linearly with traditional evaluation scores
Visualization effectively shows token confidence levels
Metrics are robust and interpretable
Abstract
Large language models (LLMs) are increasingly utilized for machine translation, yet their predictions often exhibit uncertainties that hinder interpretability and user trust. Effectively visualizing these uncertainties can enhance the usability of LLM outputs, particularly in contexts where translation accuracy is critical. This paper addresses two primary objectives: (1) providing users with token-level insights into model confidence and (2) developing a web-based visualization tool to quantify and represent translation uncertainties. To achieve these goals, we utilized the T5 model with the WMT19 dataset for translation tasks and evaluated translation quality using established metrics such as BLEU, METEOR, and ROUGE. We introduced three novel uncertainty quantification (UQ) metrics: (1) the geometric mean of token probabilities, (2) the arithmetic mean of token probabilities, and (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Softmax · Gated Linear Unit · SentencePiece · Residual Connection · Dropout · Linear Layer · Attention Dropout
