EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung; Frangil Ramirez; Juhyung Ha; Yi-Ting Chen; David Crandall; Yi-Hsuan Tsai

arXiv:2506.00101·cs.CV·September 30, 2025

EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha, Yi-Ting Chen, David Crandall, Yi-Hsuan Tsai

PDF

Open Access

TL;DR

This paper introduces a novel approach for procedure-aware video representation learning by incorporating state-change descriptions and counterfactuals generated by LLMs, enhancing understanding of cause-effect relationships in activities.

Contribution

It proposes using LLM-generated state-change descriptions and counterfactuals as supervision signals, enabling models to better understand scene transformations and failure scenarios in procedural videos.

Findings

01

Significant improvements in temporal action segmentation.

02

Enhanced error detection capabilities.

03

Effective modeling of cause-effect in procedural activities.

Abstract

Understanding a procedural activity requires modeling both how action steps transform the scene, and how evolving scene transformations can influence the sequence of action steps, even those that are accidental or erroneous. Yet, existing work on procedure-aware video representations fails to explicitly learned the state changes (scene transformations). In this work, we study procedure-aware video representation learning by incorporating state-change descriptions generated by LLMs as supervision signals for video encoders. Moreover, we generate state-change counterfactuals that simulate hypothesized failure outcomes, allowing models to learn by imagining the unseen ``What if'' scenarios. This counterfactual reasoning facilitates the model's ability to understand the cause and effect of each step in an activity. To verify the procedure awareness of our model, we conduct extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning