Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

Omer Sahin Tas; Royden Wagner

arXiv:2406.11624·cs.LG·May 19, 2025

Words in Motion: Extracting Interpretable Control Vectors for Motion Transformers

Omer Sahin Tas, Royden Wagner

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a method to interpret and modify hidden states in motion transformers using control vectors, enabling better understanding and zero-shot adaptation with minimal computational cost.

Contribution

We propose a novel approach to extract and refine interpretable control vectors from hidden states of motion transformers for improved interpretability and generalization.

Findings

01

High probing accuracy indicates latent space regularities.

02

Control vectors can modify predictions while preserving feasibility.

03

Refinement with sparse autoencoders improves linearity of modifications.

Abstract

Transformer-based models generate hidden states that are difficult to interpret. In this work, we analyze hidden states and modify them at inference, with a focus on motion forecasting. We use linear probing to analyze whether interpretable features are embedded in hidden states. Our experiments reveal high probing accuracy, indicating latent space regularities with functionally important directions. Building on this, we use the directions between hidden states with opposing features to fit control vectors. At inference, we add our control vectors to hidden states and evaluate their impact on predictions. Remarkably, such modifications preserve the feasibility of predictions. We further refine our control vectors using sparse autoencoders (SAEs). This leads to more linear changes in predictions when scaling control vectors. Our approach enables mechanistic interpretation as well as…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 5Confidence 4

Strengths

1. The paper applies interpretability approaches to transformer-based models beyond the natural language domain and propose neural collapse as a metric of interpretability. 2. The authors use sparse autoencoder-based steering for improving control vector linearity for motion control and zero-shot generalization. 3. The authors have done significant work to apply the proposed method to multiple motion forecasting architectures and datasets.

Weaknesses

1. The paper relies heavily on neural collapse as a measure for the model learning clusters of interpretable features. The authors should verify the feature clusters are indeed distinct by comparing the within-class and between-class variance. 2. The L1 sparsity in the training objective is known to induce feature shrinkage and result in poor reconstructions (Wright and Sharkey, 2024). The authors should compare l1, l2, and reconstruction loss for other SAE architectures that do not induce featu

Reviewer 02Rating 8Confidence 3

Strengths

- **Interpretability:** The method takes motion transformers, and maps hidden states to human-interpretable features, thus clarifying the model’s decision-making process. In general, interpretability is an important area. - **Controllability:** Control vectors allow manipulation of specific motion features (e.g., speed, acceleration) at inference time without retraining, enabling intuitive model adjustments. - **Zero-shot Generalization:** The interpretable control vectors support generalizati

Weaknesses

- **Reliance on Neural Collapse:** The method's effectiveness depends on well-defined hidden state clusters. If neural collapse is weak, the extracted features and control vectors may be less reliable. - **Limited Feature Scope:** The approach primarily focuses on basic motion features (e.g., speed, acceleration, direction) and could be expanded to capture more complex motion patterns and interactions with the environment. - **Limited Baselines**: Currently, they primarily focus on comparisons

Reviewer 03Rating 3Confidence 5

Strengths

The problem is well-motivated: building an interpretable and controllable motion prediction network.

Weaknesses

1. I think the overall writing needs to be improved in multiple aspects. - First, it is hard to understand why the author has focused on motion transformers in the introduction. After reading section 3 (method part), I can understand why the author needs an interpretable method for motion transformers, but in the introduction, it is explained as an application area of their method, not the main focus. - While the author suggests they have used neural collapse to measure the human interpretable f

Code & Models

Repositories

kit-mrt/future-motion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Constraint Satisfaction and Optimization · Data Management and Algorithms

MethodsFocus