DisenQ: Disentangling Q-Former for Activity-Biometrics

Shehreen Azad; Yogesh S Rawat

arXiv:2507.07262·cs.CV·July 11, 2025

DisenQ: Disentangling Q-Former for Activity-Biometrics

Shehreen Azad, Yogesh S Rawat

PDF

Open Access

TL;DR

DisenQ introduces a transformer-based framework that uses structured textual supervision to disentangle identity features from motion and appearance variations in activity-biometrics, achieving state-of-the-art results.

Contribution

The paper proposes DisenQ, a novel language-guided transformer that disentangles biometric features from motion and appearance, improving activity-based person identification.

Findings

01

Achieves state-of-the-art performance on three activity-based benchmarks.

02

Demonstrates strong generalization to real-world scenarios.

03

Outperforms traditional visual data reliance methods.

Abstract

In this work, we address activity-biometrics, which involves identifying individuals across diverse set of activities. Unlike traditional person identification, this setting introduces additional challenges as identity cues become entangled with motion dynamics and appearance variations, making biometrics feature learning more complex. While additional visual data like pose and/or silhouette help, they often struggle from extraction inaccuracies. To overcome this, we propose a multimodal language-guided framework that replaces reliance on additional visual data with structured textual supervision. At its core, we introduce \textbf{DisenQ} (\textbf{Disen}tangling \textbf{Q}-Former), a unified querying transformer that disentangles biometrics, motion, and non-biometrics features by leveraging structured language guidance. This ensures identity cues remain independent of appearance and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Context-Aware Activity Recognition Systems