From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization
Chaoqun Cui, Shijing Wang, Liangbin Huang, Qingqing Gu, Zhaolong Huang, Xiao Zeng, Wenji Mao

TL;DR
This paper presents a novel training method called Adaptive Local Preference Optimization (ALPO) for enhancing the expressiveness and vividness of subtitle translation LLMs, addressing domain-specific customization challenges.
Contribution
It introduces ALPO, a new optimization technique for fine-grained preference alignment in domain-specific translation LLMs, and provides a multidirectional subtitle parallel corpus dataset.
Findings
ALPO outperforms existing methods in multidimensional translation quality evaluation.
The constructed subtitle corpus supports domain-specific translation training.
LLMs can be effectively used as reward models and evaluators for translation quality.
Abstract
The rapid development of Large Language Models (LLMs) has significantly enhanced the general capabilities of machine translation. However, as application scenarios become more complex, the limitations of LLMs in vertical domain translations are gradually becoming apparent. In this study, we focus on how to construct translation LLMs that meet the needs of domain customization. We take visual media subtitle translation as our topic and explore how to train expressive and vivid translation LLMs. We investigated the situations of subtitle translation and other domains of literal and liberal translation, verifying the reliability of LLM as reward model and evaluator for translation. Additionally, to train an expressive translation LLM, we constructed and released a multidirectional subtitle parallel corpus dataset and proposed the Adaptive Local Preference Optimization (ALPO) method to…
Peer Reviews
Decision·ICLR 2026 Poster
This paper has a lot of positive contributions. The analysis of domains on literal vs liberal translations is quite nice. The numerous experiments on recent models encompassing a lot of modern SOTA LLMs. A potentially interesting new PO algorithm. Empirically showing Table 3 that we cannot always beat humans (which makes sense since they likely use more modalities to translate as well). A human evaluation in section 5.3
The proposed method is a novel PO algorithm, but the comparisons are only to other models, not other PO methods. The authors claim that local is better, and thus propose ALPO as opposed to using full sequence outputs such as DPO. However, this is never empirically shown (as far as I can tell). Differentiating the subtitle translation task from OpenSubtitles (Tiedemann 2016; Lison and Tiedemann 2016). It is mentioned as a core claim, but is only mentioned in section B.2.2
1. The manuscript is well-written and I could follow the logic and understand the background. 2. Comprehensive benchmarks and exisiting LLMs are evaluated in the experiments. 3. The problem of subtitle translation (maybe in a more general sense, machine translation in diverse tones and scenarios), is an important and emergent problem as SOTA LLMs has now reached a very strong performance on literal translation.
1. Although the topic is interesting and problem is important, the evaluation method is a bit weak and lack of soundness. See my questions for specific points below. 2. The technical challenges of subtile translation are context-dependency and segmentation, etc., which are not discussed in depth. Intead, the point of liberal translation that the authors highlighted is vague in this manuscirpt: for example, there is a gap between lower BLEU and hihgher liberal translation, this could be simply ca
The paper tackles a real-world application scenario—subtitle translation often benefits from more liberal renderings—and sets up a clean evaluation around that goal. The method is straightforward and well engineered: segment-level preference optimization with an LLM-as-judge, some human validation, and clear axes (accuracy/naturalness/vividness). The ablations are useful and suggest the components (e.g., segment gating, prefix mixing) actually matter. Empirically, the ALPO-trained model shows co
Novelty and Missing references and baselines: The work feels largely application-driven, and the methodological novelty is unclear beyond packaging known components (segment-level preference optimization with an LLM-as-judge) for subtitles. In particular, there’s no head-to-head comparison against fine-grained preference-learning baselines that operate below sequence level, and skips fine-grained baselines that are directly comparable to a segment-level method. At minimum, it should include
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling
