SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making
Zeyu Li, Lei Li

TL;DR
SKG-VLA introduces a structured scene knowledge graph approach for multimodal complaint decision making, enhancing reasoning accuracy and robustness by explicitly modeling scene semantics and dependencies.
Contribution
It proposes a novel Scene Knowledge Graph framework and a comprehensive training pipeline for improved multimodal complaint understanding and decision making.
Findings
Improves policy-grounded reasoning accuracy.
Enhances robustness under incomplete evidence.
Generalizes well to long-tail cases.
Abstract
Decision making in large-scale complaint handling systems increasingly relies on heterogeneous evidence, including complaint narratives, screenshots, order metadata, historical interactions, and platform policies. Existing complaint understanding systems mainly perform shallow classification or template matching over isolated modalities, while underutilizing explicit scene structure, rule knowledge, and cross-evidence dependencies. To address this limitation, we present SKG-VLA for multimodal complaint decision making. The core idea is to model each case as a structured complaint scene and represent its decision-relevant semantics with a \emph{Scene Knowledge Graph} (SKG), which organizes complaint entities, evidence items, policy clauses, temporal events, transactional states, and action-relevant relations into a unified graph. Based on SKG, we build a data synthesis pipeline that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
