Vision Language Models for Optimization-Driven Intent Processing in Autonomous Networks
Tasnim Ahmed, Yifan Zhu, Salimur Choudhury

TL;DR
This paper evaluates the ability of vision-language models to generate optimization code from network sketches for intent-based networking, highlighting current limitations and potential for practical deployment.
Contribution
It introduces IntentOpt, a benchmark for assessing VLMs on network optimization tasks, and provides a comprehensive analysis of their performance with multimodal inputs.
Findings
Visual parameter extraction reduces success rates by 12-21 pp.
Program-of-thought prompting decreases performance by up to 13 pp.
GPT-5-Mini outperforms open-source models, achieving 75% success.
Abstract
Intent-Based Networking (IBN) allows operators to specify high-level network goals rather than low-level configurations. While recent work demonstrates that large language models can automate configuration tasks, a distinct class of intents requires generating optimization code to compute provably optimal solutions for traffic engineering, routing, and resource allocation. Current systems assume text-based intent expression, requiring operators to enumerate topologies and parameters in prose. Network practitioners naturally reason about structure through diagrams, yet whether Vision-Language Models (VLMs) can process annotated network sketches into correct optimization code remains unexplored. We present IntentOpt, a benchmark of 85 optimization problems across 17 categories, evaluating four VLMs (GPT-5-Mini, Claude-Haiku-4.5, Gemini-2.5-Flash, Llama-3.2-11B-Vision) under three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Network Packet Processing and Optimization · Caching and Content Delivery
