HI-TransPA: Hearing Impairments Translation Personal Assistant
Zhiming Ma, Shiyu Gan, Junhao Zhao, Xianming Li, Qingyun Pan, Peidong Wang, Mingjun Pan, Yuhao Mo, Jiajie Cheng, Chengxin Chen, Zhonglun Cao, Chonghan Liu, Shi Cheng

TL;DR
HI-TransPA is a multimodal assistive system that translates and interprets hearing-impaired speech by combining audio and lip movement analysis, improving communication for the hearing-impaired.
Contribution
The paper introduces a novel Omni-Model framework with a unified 3D-Resampler for assistive tech, along with a quality-guided curriculum learning strategy for hearing-impaired speech translation.
Findings
Achieves state-of-the-art accuracy and semantic fidelity on HI-Dialogue dataset.
Develops a multimodal preprocessing pipeline for lip and facial landmark detection.
Demonstrates robustness through curriculum learning with sample quality scores.
Abstract
Hearing-impaired individuals often face significant barriers in daily communication due to the inherent challenges of producing clear speech. To address this, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, an instruction-driven audio-visual personal assistant. The model fuses indistinct speech with lip dynamics, enabling both translation and dialogue within a single multimodal framework. To address the distinctive pronunciation patterns of hearing-impaired speech and the limited adaptability of existing models, we develop a multimodal preprocessing and curation pipeline that detects facial landmarks, stabilizes the lip region, and quantitatively evaluates sample quality. These quality scores guide a curriculum learning strategy that first trains on clean, high-confidence samples and progressively incorporates harder cases to strengthen model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Face recognition and analysis · Social Robot Interaction and HRI
