Scene Graph as Pivoting: Inference-time Image-free Unsupervised Multimodal Machine Translation with Visual Scene Hallucination
Hao Fei, Qian Liu, Meishan Zhang, Min Zhang, Tat-Seng Chua

TL;DR
This paper introduces an inference-time image-free unsupervised multimodal machine translation approach that uses scene graphs and visual hallucination to improve translation quality without relying on paired images.
Contribution
It proposes a novel scene graph pivoting method with visual hallucination for image-free multimodal translation, enhancing translation quality in an unsupervised setting.
Findings
Outperforms baseline BLEU scores on Multi30K.
Achieves better translation completeness, relevance, and fluency.
Enables image-free inference with improved semantic understanding.
Abstract
In this work, we investigate a more realistic unsupervised multimodal machine translation (UMMT) setup, inference-time image-free UMMT, where the model is trained with source-text image pairs, and tested with only source-text inputs. First, we represent the input images and texts with the visual and language scene graphs (SG), where such fine-grained vision-language features ensure a holistic understanding of the semantics. To enable pure-text input during inference, we devise a visual scene hallucination mechanism that dynamically generates pseudo visual SG from the given textual SG. Several SG-pivoting based learning objectives are introduced for unsupervised translation training. On the benchmark Multi30K data, our SG-based method outperforms the best-performing baseline by significant BLEU scores on the task and setup, helping yield translations with better completeness, relevance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
