DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad, Yogesh S Rawat

TL;DR
DisenQ introduces a transformer-based framework that uses structured textual supervision to disentangle identity features from motion and appearance variations in activity-biometrics, achieving state-of-the-art results.
Contribution
The paper proposes DisenQ, a novel language-guided transformer that disentangles biometric features from motion and appearance, improving activity-based person identification.
Findings
Achieves state-of-the-art performance on three activity-based benchmarks.
Demonstrates strong generalization to real-world scenarios.
Outperforms traditional visual data reliance methods.
Abstract
In this work, we address activity-biometrics, which involves identifying individuals across diverse set of activities. Unlike traditional person identification, this setting introduces additional challenges as identity cues become entangled with motion dynamics and appearance variations, making biometrics feature learning more complex. While additional visual data like pose and/or silhouette help, they often struggle from extraction inaccuracies. To overcome this, we propose a multimodal language-guided framework that replaces reliance on additional visual data with structured textual supervision. At its core, we introduce \textbf{DisenQ} (\textbf{Disen}tangling \textbf{Q}-Former), a unified querying transformer that disentangles biometrics, motion, and non-biometrics features by leveraging structured language guidance. This ensures identity cues remain independent of appearance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Context-Aware Activity Recognition Systems
