SATO: Stable Text-to-Motion Framework

Wenshuo Chen; Hongru Xiao; Erhang Zhang; Lijie Hu; Lei Wang; Mengyuan; Liu; Chen Chen

arXiv:2405.01461·cs.CV·August 19, 2024

SATO: Stable Text-to-Motion Framework

Wenshuo Chen, Hongru Xiao, Erhang Zhang, Lijie Hu, Lei Wang, Mengyuan, Liu, Chen Chen

PDF

1 Repo

TL;DR

This paper introduces SATO, a framework designed to improve the stability of text-to-motion models by addressing output inconsistency issues caused by unstable attention patterns, while maintaining high accuracy.

Contribution

SATO provides a formal framework with modules for stable attention and prediction, enhancing robustness against input perturbations in text-to-motion models.

Findings

01

SATO significantly improves stability against synonym perturbations.

02

SATO maintains high accuracy comparable to existing models.

03

The framework effectively reduces output inconsistency caused by attention instability.

Abstract

Is the Text to Motion model robust? Recent advancements in Text to Motion models primarily stem from more accurate predictions of specific actions. However, the text modality typically relies solely on pre-trained Contrastive Language-Image Pretraining (CLIP) models. Our research has uncovered a significant issue with the text-to-motion model: its predictions often exhibit inconsistent outputs, resulting in vastly different or even incorrect poses when presented with semantically similar or identical text inputs. In this paper, we undertake an analysis to elucidate the underlying causes of this instability, establishing a clear link between the unpredictability of model outputs and the erratic attention patterns of the text encoder module. Consequently, we introduce a formal framework aimed at addressing this issue, which we term the Stable Text-to-Motion Framework (SATO). SATO consists…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sato-team/stable-text-to-motion-framework
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.