Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi; Shangzhe Di; Qirui Chen; Weidi Xie

arXiv:2412.01694·cs.CV·February 14, 2025

Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation

Yudi Shi, Shangzhe Di, Qirui Chen, Weidi Xie

PDF

Open Access

TL;DR

This paper introduces Agent-of-Thoughts Distillation (AoTD), a novel method that improves VideoQA models by integrating reasoning chains and verification mechanisms, leading to better performance and explainability.

Contribution

The paper proposes AoTD, a new approach that incorporates automatically generated reasoning chains and verification to enhance VideoQA models' reasoning and explainability.

Findings

01

AoTD improves performance on multiple VideoQA benchmarks.

02

The method enhances model explainability through reasoning chains.

03

Verification mechanism increases the reliability of generated reasoning.

Abstract

This paper tackles the problem of video question answering (VideoQA), a task that often requires multi-step reasoning and a profound understanding of spatial-temporal dynamics. While large video-language models perform well on benchmarks, they often lack explainability and spatial-temporal grounding. In this paper, we propose Agent-of-Thoughts Distillation (AoTD), a method that enhances models by incorporating automatically generated Chain-of-Thoughts (CoTs) into the instruction-tuning process. Specifically, we leverage an agent-based system to decompose complex questions into sub-tasks, and address them with specialized vision models, the intermediate results are then treated as reasoning chains. We also introduce a verification mechanism using a large language model (LLM) to ensure the reliability of generated CoTs. Extensive experiments demonstrate that AoTD improves the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Data Visualization and Analytics