Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling

Chaoqun Cui; Shijing Wang; Liangbin Huang; Qingqing Gu; Zhaolong Huang; Xiao Zeng; Wenji Mao

arXiv:2602.00597·cs.CL·February 3, 2026

Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling

Chaoqun Cui, Shijing Wang, Liangbin Huang, Qingqing Gu, Zhaolong Huang, Xiao Zeng, Wenji Mao

PDF

Open Access

TL;DR

Hermes is a novel LLM-based framework designed to improve the quality and expressiveness of interlingual subtitles by addressing semantic coherence, pronoun translation, and terminology issues through specialized modules.

Contribution

The paper introduces Hermes, a unified framework with three modules that significantly enhances expressiveness and coherence in interlingual subtitling using large language models.

Findings

01

Achieves state-of-the-art speaker diarization performance

02

Generates more expressive and contextually coherent translations

03

Addresses key challenges in interlingual subtitling with LLMs

Abstract

Interlingual subtitling, which translates subtitles of visual media into a target language, is essential for entertainment localization but has not yet been explored in machine translation. Although Large Language Models (LLMs) have significantly advanced the general capabilities of machine translation, the distinctive characteristics of subtitle texts pose persistent challenges in interlingual subtitling, particularly regarding semantic coherence, pronoun and terminology translation, and translation expressiveness. To address these issues, we present Hermes, an LLM-based automated subtitling framework. Hermes integrates three modules: Speaker Diarization, Terminology Identification, and Expressiveness Enhancement, which effectively tackle the above challenges. Experiments demonstrate that Hermes achieves state-of-the-art diarization performance and generates expressive, contextually…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSubtitles and Audiovisual Media · Translation Studies and Practices · Multimodal Machine Learning Applications