Divide, Harmonize, Then Conquer It: Shooting Multi-Commodity Flow Problems with Multimodal Language Models
Xinyu Yuan, Yan Qiao, Zonghui Wang, Wenzhi Chen

TL;DR
Pram is a novel ML-based approach using multimodal language models to efficiently solve multi-commodity flow problems by dividing, solving locally, and harmonizing globally, achieving near-optimal solutions with faster runtimes and robustness.
Contribution
This paper introduces Pram, the first ML-based method leveraging multimodal language models for multi-commodity flow problems, combining problem division, local solving, and global harmonization.
Findings
Achieves near-optimal solutions comparable to linear programming solvers.
Runs 10 to 100 times faster than traditional solvers.
Maintains robustness under network failures and flow bursts.
Abstract
The multi-commodity flow (MCF) problem is a fundamental topic in network flow and combinatorial optimization, with broad applications in transportation, communication, and logistics, etc. Nowadays, the rapid expansion of allocation systems has posed challenges for existing optimization engines in balancing optimality and tractability. In this paper, we present Pram, the first ML-based method that leverages the reasoning power of multimodal language models (MLMs) for addressing the trade-off dilemma -- a great need of service providers. As part of our proposal, Pram (i) quickly computes high-quality allocations by dividing the original problem into local subproblems, which are then resolved by an MLM-powered "agent", and (ii) ensures global consistency by harmonizing these subproblems via a multi-agent reinforcement learning algorithm. Theoretically, we show that Pram, which learns to…
Peer Reviews
Decision·ICLR 2026 Poster
1. Using multimodal LLMs as distributed solvers for MCF is genuinely new. Prior work on neural optimization largely relied on GNNs or RL; this paper reframes the task as multimodal reasoning over partitioned problems, which feels like a meaningful step forward. 2. The paper’s theoretical discussion—particularly the connection between convexity in MCF and the ability of MLMs to simulate gradient descent is well argued and gives the approach more credibility than most ML-for-optimization papers.
1. The multimodal input design (topology as image + demand as text) is acknowledged to introduce layout sensitivity and encoding bias. 2. Although faster than LP in inference, fine-tuning large MLMs still requires considerable resources.
1. **Novelty.** To the best of my knowledge, this paper is among the first to leverage the reasoning power of modern Multimodal Language Models (MLMs) for a classic combinatorial optimization problem (MCF). It bridges the gap between large-scale AI models and network optimization. 2. **Significant Performance (Speed and Scalability).** Based on the results, PRAM is significantly faster on large-scale topologies. The ablation study confirms that the partitioning is critical to this scalability. 3
1. **Cost in Training**. From my understanding, the superior scalability of PRAM is achieved by truncation of backbone layers and parameter-efficient adaptation. This can potentially be a problem during deployment. 2. **Memory overhead**. Input to the model are images of the subnetworks, which are sensitive to the partition parameters. I wonder if this can lead to significant memory overhead, especially when different subproblems (nodes) have similar network views that are processed redundantly.
1. Well written and clear 2. A novel method based on decomposition 3. Strong empirical results, with intersting robust predictions
- the neural architecture is quite complex
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Advanced Optical Network Technologies · Vehicle Routing Optimization Methods
