Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation

Zerui Kang; Yishen Lim; Zhouyou Gu; Seung-Woo Ko; Tony Q.S. Quek; Jihong Park

arXiv:2601.18242·cs.CV·April 2, 2026

Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation

Zerui Kang, Yishen Lim, Zhouyou Gu, Seung-Woo Ko, Tony Q.S. Quek, Jihong Park

PDF

TL;DR

This paper introduces a VLM-guided framework that accelerates and stabilizes multi-material RF parameter estimation in differentiable ray tracing, achieving faster convergence and higher accuracy with fewer measurements.

Contribution

It integrates vision-language models to provide semantic priors and optimal measurement placements, enhancing the efficiency and reliability of RF parameter estimation in complex scenes.

Findings

01

Achieves 2-4× faster convergence compared to baselines.

02

Attains 10-100× lower final error in RF parameters.

03

Reduces measurement requirements through VLM-guided placement.

Abstract

Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4 $\times$ faster convergence and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.