Controlla: Learning Controllability via Graph-Constrained Latent Geometry
Jamuna S. Murthy, Amin Karimi Monsefi, Rajiv Ramnath

TL;DR
Controlla introduces a graph-constrained latent geometry framework for improved controllability and identity preservation in multimodal generation, utilizing structured latent factors and optimal transport.
Contribution
It proposes a novel modular framework that explicitly models controllability through structured latent geometry and graph priors, enhancing multimodal control and consistency.
Findings
Improves controllability and identity preservation in multimodal tasks.
Constructs AffectHuman-43K benchmark for affective control evaluation.
Demonstrates robustness and extensibility of the proposed method.
Abstract
Controllable multimodal generation is commonly formulated as an inference-time conditioning problem using prompts, guidance, or auxiliary modules. While effective, such approaches do not explicitly structure how semantic attributes evolve, which can lead to identity drift and inconsistent cross-modal behavior. We propose Controlla, a modular factorized-control framework that treats controllability as a property of structured latent geometry. Controlla learns identity and attribute factors from multimodal inputs and aligns them with graph priors using graph-constrained optimal transport, encouraging attributes to follow graph-consistent trajectories while preserving reference identity. To evaluate this setting, we construct AffectHuman-43K, a leakage-aware multimodal benchmark for reference-grounded affective control, and introduce geometry-aware metrics for trajectory consistency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
