Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation

Luca Cazzola; Ahed Alboody

arXiv:2512.11654·cs.CV·January 27, 2026

Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation

Luca Cazzola, Ahed Alboody

PDF

Open Access

TL;DR

KineMIC is a transfer learning framework that adapts text-to-motion models for few-shot, kinematically precise action synthesis, significantly improving data augmentation for human activity recognition.

Contribution

It introduces a novel kinetic mining strategy that fine-tunes generalist T2M models for HAR-specific motion generation using semantic correspondences in text embeddings.

Findings

01

Achieves +23.1% accuracy improvement in HAR classification.

02

Generates more coherent and class-discriminative motions.

03

Effective with only 10 samples per action class.

Abstract

The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models offer a compelling, scalable source of synthetic data, their training objectives, which emphasize general artistic motion, and dataset structures fundamentally differ from HAR's requirements for kinematically precise, class-discriminative actions. This disparity creates a significant domain gap, making generalist T2M models ill-equipped for generating motions suitable for HAR classifiers. To address this challenge, we propose KineMIC (Kinetic Mining In Context), a transfer learning framework for few-shot action synthesis. KineMIC adapts a T2M diffusion model to an HAR domain by hypothesizing that semantic correspondences in the text encoding space can provide soft supervision for kinematic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Human Pose and Action Recognition · Generative Adversarial Networks and Image Synthesis