Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Tsai-Shien Chen; Aliaksandr Siarohin; Gordon Guocheng Qian; Kuan-Chieh Jackson Wang; Egor Nemchinov; Moayed Haji-Ali; Riza Alp Guler; Willi Menapace; Ivan Skorokhodov; Anil Kag; Jun-Yan Zhu; Sergey Tulyakov

arXiv:2512.10955·cs.CV·May 1, 2026

Omni-Attribute: Open-vocabulary Attribute Encoder for Visual Concept Personalization

Tsai-Shien Chen, Aliaksandr Siarohin, Gordon Guocheng Qian, Kuan-Chieh Jackson Wang, Egor Nemchinov, Moayed Haji-Ali, Riza Alp Guler, Willi Menapace, Ivan Skorokhodov, Anil Kag, Jun-Yan Zhu, Sergey Tulyakov

PDF

TL;DR

Omni-Attribute introduces an open-vocabulary attribute encoder that learns disentangled, attribute-specific image representations, enabling improved personalization and generation without information leakage.

Contribution

It is the first open-vocabulary encoder designed for high-fidelity, attribute-specific representations, using curated data and dual-objective training for disentanglement.

Findings

01

Achieves state-of-the-art results in attribute retrieval and personalization.

02

Effectively disentangles visual attributes for coherent image synthesis.

03

Outperforms existing methods across multiple benchmarks.

Abstract

Visual concept personalization aims to transfer only specific image attributes, such as identity, expression, lighting, and style, into unseen contexts. However, existing methods rely on holistic embeddings from general-purpose image encoders, which entangle multiple visual factors and make it difficult to isolate a single attribute. This often leads to information leakage and incoherent synthesis. To address this limitation, we introduce Omni-Attribute, the first open-vocabulary image attribute encoder designed to learn high-fidelity, attribute-specific representations. Our approach jointly designs the data and model: (i) we curate semantically linked image pairs annotated with positive and negative attributes to explicitly teach the encoder what to preserve or suppress; and (ii) we adopt a dual-objective training paradigm that balances generative fidelity with contrastive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.