Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation
Luca Cazzola, Ahed Alboody

TL;DR
KineMIC is a transfer learning framework that adapts text-to-motion models for few-shot, kinematically precise action synthesis, significantly improving data augmentation for human activity recognition.
Contribution
It introduces a novel kinetic mining strategy that fine-tunes generalist T2M models for HAR-specific motion generation using semantic correspondences in text embeddings.
Findings
Achieves +23.1% accuracy improvement in HAR classification.
Generates more coherent and class-discriminative motions.
Effective with only 10 samples per action class.
Abstract
The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models offer a compelling, scalable source of synthetic data, their training objectives, which emphasize general artistic motion, and dataset structures fundamentally differ from HAR's requirements for kinematically precise, class-discriminative actions. This disparity creates a significant domain gap, making generalist T2M models ill-equipped for generating motions suitable for HAR classifiers. To address this challenge, we propose KineMIC (Kinetic Mining In Context), a transfer learning framework for few-shot action synthesis. KineMIC adapts a T2M diffusion model to an HAR domain by hypothesizing that semantic correspondences in the text encoding space can provide soft supervision for kinematic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis
