Face, Whole-Person, and Object Classification in a Unified Space Via The Interleaved Multi-Domain Identity Curriculum

Thomas M Metz; Matthew Q Hill; Alice J O'Toole

arXiv:2511.19846·cs.CV·November 26, 2025

Face, Whole-Person, and Object Classification in a Unified Space Via The Interleaved Multi-Domain Identity Curriculum

Thomas M Metz, Matthew Q Hill, Alice J O'Toole

PDF

Open Access

TL;DR

This paper introduces the Interleaved Multi-Domain Identity Curriculum (IMIC), a training method enabling foundation models to perform object, face, and person recognition tasks simultaneously in a shared embedding space without catastrophic forgetting.

Contribution

The paper presents IMIC, a novel interleaved training schedule that allows fine-tuning foundation models on multiple recognition tasks concurrently, maintaining generalization and outperforming prior methods.

Findings

01

IMIC enables multi-task recognition in a single embedding space.

02

EVA-02 and CLIP models achieved human-level multi-task performance.

03

The approach preserves out-of-distribution generalization.

Abstract

Vision foundation models can perform generalized object classification in zero-shot mode, and face/person recognition when they are fine-tuned. However, fine-tuned models suffer from catastrophic forgetting. We create models that perform four tasks (object recognition, face recognition from high- and low-quality images, and person recognition from whole-body images) in a single embedding space -- without incurring substantial catastrophic forgetting. To accomplish this, we introduce two variants of the Interleaved Multi-Domain Identity Curriculum (IMIC): a gradient-coupled, interleaving training schedule that fine-tunes a foundation backbone simultaneously on all four tasks. The IMIC method proved effective with three foundation model bases: DINOv3, CLIP, and EVA-02. Two of these (EVA-02 and CLIP) performed comparably with domain experts on all four tasks concurrently and were more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Face Recognition and Perception · Face and Expression Recognition