TalkLoRA: Low-Rank Adaptation for Speech-Driven Animation
Jack Saunders, Vinay Namboodiri

TL;DR
TalkLoRA introduces a low-rank adaptation method for transformer-based speech-driven facial animation, enabling efficient style personalization and faster inference for long sentences without quality loss.
Contribution
The paper presents TalkLoRA, a novel low-rank adaptation approach that improves style adaptation and reduces inference complexity in transformer-based speech-driven animation.
Findings
Achieves state-of-the-art style adaptation performance.
Reduces inference complexity by an order of magnitude.
Maintains high animation quality with limited data.
Abstract
Speech-driven facial animation is important for many applications including TV, film, video games, telecommunication and AR/VR. Recently, transformers have been shown to be extremely effective for this task. However, we identify two issues with the existing transformer-based models. Firstly, they are difficult to adapt to new personalised speaking styles and secondly, they are slow to run for long sentences due to the quadratic complexity of the transformer. We propose TalkLoRA to address both of these issues. TalkLoRA uses Low-Rank Adaptation to effectively and efficiently adapt to new speaking styles, even with limited data. It does this by training an adaptor with a small number of parameters for each subject. We also utilise a chunking strategy to reduce the complexity of the underlying transformer, allowing for long sentences at inference time. TalkLoRA can be applied to any…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Face recognition and analysis
