Cued-Agent: A Collaborative Multi-Agent System for Automatic Cued Speech Recognition
Guanjie Huang, Danny H.K. Tsang, Shan Yang, Guangzhi Lei, Li Liu

TL;DR
This paper introduces Cued-Agent, a novel collaborative multi-agent system that significantly improves automatic cued speech recognition by integrating specialized agents for hand and lip recognition, dynamic prompt decoding, and semantic refinement, especially under limited data conditions.
Contribution
It presents the first multi-agent system for ACSR, combining multiple specialized agents to enhance recognition accuracy and robustness in limited data scenarios.
Findings
Outperforms state-of-the-art methods in recognition accuracy.
Effectively handles multimodal fusion with limited data.
Enables end-to-end phoneme-to-word conversion.
Abstract
Cued Speech (CS) is a visual communication system that combines lip-reading with hand coding to facilitate communication for individuals with hearing impairments. Automatic CS Recognition (ACSR) aims to convert CS hand gestures and lip movements into text via AI-driven methods. Traditionally, the temporal asynchrony between hand and lip movements requires the design of complex modules to facilitate effective multimodal fusion. However, constrained by limited data availability, current methods demonstrate insufficient capacity for adequately training these fusion mechanisms, resulting in suboptimal performance. Recently, multi-agent systems have shown promising capabilities in handling complex tasks with limited data availability. To this end, we propose the first collaborative multi-agent system for ACSR, named Cued-Agent. It integrates four specialized sub-agents: a Multimodal Large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Hand Gesture Recognition Systems · Face recognition and analysis
