BRAT: Bonus oRthogonAl Token for Architecture Agnostic Textual Inversion
James Baker

TL;DR
This paper introduces BRAT, a novel approach for textual inversion that employs bonus tokens and a vision transformer to improve personalization of diffusion models across different architectures.
Contribution
It proposes a new method using bonus tokens and orthogonality constraints, enabling architecture-agnostic textual inversion without relying on the UNet.
Findings
Bonus tokens improve adherence to source images
Vision transformer enhances adherence to prompts
Method is architecture-agnostic and improves personalization
Abstract
Textual Inversion remains a popular method for personalizing diffusion models, in order to teach models new subjects and styles. We note that textual inversion has been underexplored using alternatives to the UNet, and experiment with textual inversion with a vision transformer. We also seek to optimize textual inversion using a strategy that does not require explicit use of the UNet and its idiosyncratic layers, so we add bonus tokens and enforce orthogonality. We find the use of the bonus token improves adherence to the source images and the use of the vision transformer improves adherence to the prompt. Code is available at https://github.com/jamesBaker361/tex_inv_plus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
MethodsAttention Is All You Need · Softmax · Dense Connections · Linear Layer · Residual Connection · Layer Normalization · Multi-Head Attention · Vision Transformer · Diffusion
