A Pattern Language for Resilient Visual Agents
Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll

TL;DR
This paper proposes an architectural pattern language for visual agents that balances real-time deterministic responses with probabilistic vision models in enterprise systems.
Contribution
It introduces four specific design patterns to structure visual agents for improved resilience and performance in enterprise environments.
Findings
Four architectural design patterns are proposed.
Patterns enable separation of deterministic reflexes and probabilistic supervision.
The approach improves system resilience and real-time performance.
Abstract
Integrating multimodal foundation models into enterprise ecosystems presents a fundamental software architecture challenge. Architects must balance competing quality attributes: the high latency and non-determinism of vision language action (VLA) models versus the strict determinism and real-time performance required by enterprise control loops. In this study, we propose an architectural pattern language for visual agents that separates fast, deterministic reflexes from slow, probabilistic supervision. It consists of four architectural design patterns: (1) Hybrid Affordance Integration, (2) Adaptive Visual Anchoring, (3) Visual Hierarchy Synthesis, and (4) Semantic Scene Graph.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
