Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks

Yequan Zhao; Xinling Yu; Xian Xiao; Zhixiong Chen; Ziyue Liu; Geza Kurczveil; Raymond G. Beausoleil; Sijia Liu; Zheng Zhang

arXiv:2502.12384·cs.LG·February 10, 2026

Scalable Back-Propagation-Free Training of Optical Physics-Informed Neural Networks

Yequan Zhao, Xinling Yu, Xian Xiao, Zhixiong Chen, Ziyue Liu, Geza Kurczveil, Raymond G. Beausoleil, Sijia Liu, Zheng Zhang

PDF

Open Access 3 Reviews

TL;DR

This paper introduces a novel, scalable, back-propagation-free training framework for optical physics-informed neural networks, enabling real-time PDE solving on photonic hardware with significant speed and area advantages.

Contribution

It presents a new BP-free training method for PINNs using sparse-grid Stein derivatives and tensor-train optimization, along with a photonic accelerator design for scalable on-chip training.

Findings

01

Validated on PDE benchmarks showing high accuracy

02

Demonstrated real-time training capability in pre-silicon simulations

03

Achieved significant reductions in chip area and latency

Abstract

Physics intelligence and digital twins often require rapid and repeated performance evaluation of various engineering systems (e.g. robots, autonomous vehicles, semiconductor chips) to enable (almost) real-time actions or decision making. This has motivated the development of accelerated partial differential equation (PDE) solvers, in resource-constrained scenarios if the PDE solvers are to be deployed on the edge. Physics-informed neural networks (PINNs) have shown promise in solving high-dimensional PDEs, but the training time on state-of-the-art digital hardware (e.g., GPUs) is still orders-of-magnitude longer than the latency required for enabling real-time decision making. Photonic computing offers a potential solution to address this huge latency gap because of its ultra-high operation speed. However, the lack of photonic memory and the large device sizes prevent training…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

Strengths: 1. The proposed method largely reduce the dimensionality in zeroth-order ONN training, thus can train realistic PINN model using sampling-based ZO optimizer. 2. Compared to prior methods, it shows faster convergence and better error. 3. The training process considers hardware nonideality, e.g., quantization, noises.

Weaknesses

Weaknesses: 1. Besides training from scratch, if there is a pretrained digital PINN model, how does the mapping efficiency of the proposed method compared to prior methods. This is also an actual deployment use case. 2. What is the detailed photonic accelerator setting, e.g., core size, etc. Noticed the training parameters in Table 3 are different, especially L2ight has very few parameters. What could be this reason? Is it possible to keep a similar parameter size by modifying the model settin

Reviewer 02Rating 3Confidence 5

Strengths

Strength: * A complete evaluation from algorithm to chip, and consider some realistic constraints, e.g., resolution limits.

Weaknesses

Weakness * First, the papers' main methods are both not new but borrow from other domains without many new contributions/modifications. For example, the low-rank compression with tensor-train is not new. Also the sparse-grid method is also a typical method in the PDE domain. * Second, some claims are not accurate. For example, a photonic accelerator will not build a huge chip directly for a 128x128 array. Instead, it also blokifies the large weight matrix into a small one and implements on-chip.

Reviewer 03Rating 6Confidence 4

Strengths

The paper deals with a highly relevant and challenging task, such as the training and scalability of optical neural networks, proposing a numerical training framework that is accompanied by a photonic accelerator design. The proposed tensor compressed zeroth-order optimization approach shows that not only it allows us to reduce the required MZIs but it is also experimentally demonstrated that benefits the convergence of the model. With the integration of Strain derivative estimation and the pro

Weaknesses

Although, as the paper presents an optical inference engine it lacks critical details regarding the overall architecture and computational performance. For example, the authors mention in the supplementary material that the phase $\phi$ is uniformly quantized to 8 bit in the simulation but they do not comment on the rate at which such bit resolution can be achieved. Even state-of-the-art optical devices facing low SNRs, especially at high computational rates, significantly affecting the effectiv

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Reservoir Computing