Video-Skill-CoT: Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning
Daeun Lee, Jaehong Yoon, Jaemin Cho, Mohit Bansal

TL;DR
Video-SKoT introduces a skill-aware Chain-of-Thought framework that enhances domain-specific video reasoning by automatically constructing skill-based annotations and training specialized expert modules, leading to improved performance across benchmarks.
Contribution
The paper presents a novel method for constructing skill-based CoT annotations and training skill-specific experts, enabling better domain adaptation in video reasoning tasks.
Findings
Outperforms strong baselines on three video understanding benchmarks.
Effectively leverages domain-relevant reasoning skills for improved accuracy.
Provides detailed analysis of skill annotation and expert learning processes.
Abstract
Recent advances in Chain-of-Thought (CoT) reasoning have improved complex video understanding, but existing methods often struggle to adapt to domain-specific skills (e.g., event detection, spatial relation understanding, emotion understanding) over various video content. To address this, we propose Video-Skill-CoT (a.k.a. Video-SKoT), a framework that automatically constructs and leverages skill-aware CoT supervisions for domain-adaptive video reasoning. First, we construct skill-based CoT annotations: we extract domain-relevant reasoning skills from training questions, cluster them into a shared skill taxonomy, and create detailed multi-step CoT rationale tailored to each video-question pair for training. Second, we introduce a skill-specific expert learning framework. Each expert module specializes in a subset of reasoning skills and is trained with lightweight adapters using the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Explainable Artificial Intelligence (XAI)
