Zero-shot Compositional Action Recognition with Neural Logic Constraints

Gefan Ye; Lin Li; Kexin Li; Jun Xiao; Long Chen

arXiv:2508.02320·cs.CV·August 12, 2025

Zero-shot Compositional Action Recognition with Neural Logic Constraints

Gefan Ye, Lin Li, Kexin Li, Jun Xiao, Long Chen

PDF

Open Access

TL;DR

This paper introduces LogicCAR, a neural framework that incorporates symbolic logic constraints to improve zero-shot compositional action recognition by modeling structure and hierarchy, leading to better generalization.

Contribution

It proposes a novel logic-driven approach that embeds compositional and hierarchical constraints into neural networks for zero-shot action recognition.

Findings

01

Outperforms baseline methods on Sth-com dataset

02

Effectively models compositional and hierarchical structures

03

Enhances reasoning capacity in zero-shot scenarios

Abstract

Zero-shot compositional action recognition (ZS-CAR) aims to identify unseen verb-object compositions in the videos by exploiting the learned knowledge of verb and object primitives during training. Despite compositional learning's progress in ZS-CAR, two critical challenges persist: 1) Missing compositional structure constraint, leading to spurious correlations between primitives; 2) Neglecting semantic hierarchy constraint, leading to semantic ambiguity and impairing the training process. In this paper, we argue that human-like symbolic reasoning offers a principled solution to these challenges by explicitly modeling compositional and hierarchical structured abstraction. To this end, we propose a logic-driven ZS-CAR framework LogicCAR that integrates dual symbolic constraints: Explicit Compositional Logic and Hierarchical Primitive Logic. Specifically, the former models the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization