Vision-Language-Model-Guided Differentiable Ray Tracing for Fast and Accurate Multi-Material RF Parameter Estimation
Zerui Kang, Yishen Lim, Zhouyou Gu, Seung-Woo Ko, Tony Q.S. Quek, Jihong Park

TL;DR
This paper introduces a VLM-guided framework that accelerates and stabilizes multi-material RF parameter estimation in differentiable ray tracing, achieving faster convergence and higher accuracy with fewer measurements.
Contribution
It integrates vision-language models to provide semantic priors and optimal measurement placements, enhancing the efficiency and reliability of RF parameter estimation in complex scenes.
Findings
Achieves 2-4× faster convergence compared to baselines.
Attains 10-100× lower final error in RF parameters.
Reduces measurement requirements through VLM-guided placement.
Abstract
Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4 faster convergence and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
