Ortho-Hydra: Orthogonalized Experts for DiT LoRA

Seunghyun Ji

arXiv:2605.03252·cs.LG·May 6, 2026

Ortho-Hydra: Orthogonalized Experts for DiT LoRA

Seunghyun Ji

PDF

1 Repo

TL;DR

Ortho-Hydra introduces a novel re-parameterization for LoRA fine-tuning of diffusion transformers, enabling better specialization and reducing style bleed by orthogonalizing expert subspaces.

Contribution

It proposes Ortho-Hydra, a method that combines orthogonal basis with disjoint expert subspaces to improve expert specialization in LoRA fine-tuning.

Findings

01

Ortho-Hydra de-uniformises faster than baselines within hundreds of steps.

02

Disjoint expert subspaces enable better gradient signals at initialization.

03

Code is available at https://github.com/sorryhyun/anima_lora.

Abstract

LoRA fine-tuning of diffusion transformers (DiT) on multi-style data suffers from \emph{style bleed}: a single low-rank residual cannot represent several distinct artist fingerprints, and the optimizer converges to their average. Mixture-of-experts LoRA in the HydraLoRA style replaces the up-projection with $E$ heads under a router, but when every expert is zero-initialized the router receives identical gradient from each head and remains at the uniform prior. The experts then evolve permutation-symmetrically, and the network trains as a single rank- $r$ LoRA at $E \times$ the cost. We present \textbf{Ortho-Hydra}, a re-parameterisation that combines an OFT-style Cayley-orthogonal shared basis with per-expert \emph{disjoint output subspaces} carved from the top- $(E r)$ left singular vectors of the pretrained weight. Disjointness makes the router's per-expert score non-degenerate at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sorryhyun/anima_lora
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.