SignDPO: Multi-level Direct Preference Optimisation for Skeleton-based Gloss-free Sign Language Translation
Muxin Pu, Xiao-Ming Wu, Mei Kuan Lim, Chun Yong Chong, Wei Li, Chen Change Loy

TL;DR
SignDPO introduces a multi-level preference optimization framework that improves skeleton-based sign language translation by focusing on structured preference alignment across multiple dimensions.
Contribution
The paper proposes a novel multi-level DPO framework with hierarchical perturbation, semantic saliency guidance, and automated preference generation for sign language translation.
Findings
SignDPO outperforms state-of-the-art gloss-free methods on three benchmarks.
The framework effectively captures fine-grained spatio-temporal nuances.
Results show competitive performance with gloss-based translation methods.
Abstract
We present SignDPO, a novel multi-level Direct Preference Optimisation (DPO) framework designed to enhance the alignment of skeleton-based Sign Language Translation. While current skeleton-based models have made significant progress using Maximum Likelihood Estimation, they are primarily constrained by an imitation-based paradigm that lacks discriminative sensitivity to the fine-grained spatio-temporal nuances of sign language, often leading to semantic drift. To address this, SignDPO shifts the optimisation goal from simple sequence mimicry to structured preference alignment across spatial, temporal, and linguistic dimensions. Our framework involves three key designs. First, we introduce a hierarchical perturbation strategy to construct spatial and temporal non-preferred samples at both global and local granularities automatically. Second, we propose a self-guiding mechanism that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
