Guided Attention for Interpretable Motion Captioning

Karim Radouane; Julien Lagarde; Sylvie Ranwez; Andon Tchechmedjiev

arXiv:2310.07324·cs.CV·September 4, 2024

Guided Attention for Interpretable Motion Captioning

Karim Radouane, Julien Lagarde, Sylvie Ranwez, Andon Tchechmedjiev

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel, interpretable motion captioning model that uses guided attention mechanisms to improve caption quality and provide insights into human motion, outperforming non-interpretable state-of-the-art systems.

Contribution

It introduces an attention-guided architecture for motion captioning that enhances interpretability and performance, with methods for guiding attention during training.

Findings

01

Improved captioning performance over state-of-the-art models.

02

Enhanced interpretability through attention guidance.

03

Ability to localize actions and identify body parts.

Abstract

Diverse and extensive work has recently been conducted on text-conditioned human motion generation. However, progress in the reverse direction, motion captioning, has seen less comparable advancement. In this paper, we introduce a novel architecture design that enhances text generation quality by emphasizing interpretability through spatio-temporal and adaptive attention mechanisms. To encourage human-like reasoning, we propose methods for guiding attention during training, emphasizing relevant skeleton areas over time and distinguishing motion-related words. We discuss and quantify our model's interpretability using relevant histograms and density distributions. Furthermore, we leverage interpretability to derive fine-grained information about human motion, including action localization, body part identification, and the distinction of motion-related words. Finally, we discuss the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rd20karim/m2t-interpretable
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Human Motion and Animation