Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models

Yuji Sato; Yasunori Ishii; Takayoshi Yamashita

arXiv:2508.00374·cs.CV·August 4, 2025

Bidirectional Action Sequence Learning for Long-term Action Anticipation with Large Language Models

Yuji Sato, Yasunori Ishii, Takayoshi Yamashita

PDF

Open Access

TL;DR

This paper introduces BiAnt, a bidirectional action sequence learning method using large language models, significantly improving long-term action anticipation by capturing semantic sub-actions more effectively.

Contribution

It presents a novel bidirectional learning approach with large language models for enhanced long-term action anticipation in videos.

Findings

01

BiAnt outperforms baseline methods in edit distance on Ego4D dataset.

02

Bidirectional prediction captures semantic sub-actions more effectively.

03

Large language models improve long-term action anticipation accuracy.

Abstract

Video-based long-term action anticipation is crucial for early risk detection in areas such as automated driving and robotics. Conventional approaches extract features from past actions using encoders and predict future events with decoders, which limits performance due to their unidirectional nature. These methods struggle to capture semantically distinct sub-actions within a scene. The proposed method, BiAnt, addresses this limitation by combining forward prediction with backward prediction using a large language model. Experimental results on Ego4D demonstrate that BiAnt improves performance in terms of edit distance compared to baseline methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Autonomous Vehicle Technology and Safety · Social Robot Interaction and HRI