Diffractive neural networks for mode-sorting with flexible detection regions

Kaden Bearne; Alexander Duplinskiy; Matthew J. Filipovich; and A. I. Lvovsky

arXiv:2508.20058·physics.optics·August 28, 2025

Diffractive neural networks for mode-sorting with flexible detection regions

Kaden Bearne, Alexander Duplinskiy, Matthew J. Filipovich, and A. I. Lvovsky

PDF

TL;DR

This paper introduces a diffractive optical neural network for mode-sorting that optimizes detection regions during training, resulting in higher efficiency and lower crosstalk compared to traditional methods.

Contribution

The novel approach integrates output detection regions into the training of a diffractive neural network for mode-sorting, enhancing performance.

Findings

01

Achieves higher efficiency than traditional mode-sorting methods.

02

Reduces crosstalk levels in mode separation.

03

Demonstrates the advantage of trainable detection regions.

Abstract

Mode-sorting is a procedure that decomposes a light field into a basis of transverse modes, directing each mode into a separate spatial location, allowing the constituent mode intensities to be measured simultaneously. We demonstrate a mode-sorter based on a diffractive optical neural network and show that it is advantageous to include the output detection regions into the trainable set of parameters of that network. This approach outperforms traditional mode-sorting methods, achieving higher efficiency for the same crosstalk levels.

Equations16

φ_{p} \to φ_{p} - a sign [Im {Ψ_{p}^{for} [Ψ_{p}^{back}]^{*}} .

φ_{p} \to φ_{p} - a sign [Im {Ψ_{p}^{for} [Ψ_{p}^{back}]^{*}} .

φ_{p} \to φ_{p} - arg {Ψ_{p}^{for} [Ψ_{p}^{back}]^{*}} .

φ_{p} \to φ_{p} - arg {Ψ_{p}^{for} [Ψ_{p}^{back}]^{*}} .

Loss = - \int Ψ_{out}^{*} Ψ_{output plane}^{for} d x d y^{2} .

Loss = - \int Ψ_{out}^{*} Ψ_{output plane}^{for} d x d y^{2} .

Loss = - i = 1 \sum n \int [Ψ_{out}^{(i)}]^{*} Ψ^{for, (i)} d x d y^{2} .

Loss = - i = 1 \sum n \int [Ψ_{out}^{(i)}]^{*} Ψ^{for, (i)} d x d y^{2} .

I_{ij} = \int_{D_{j}} Ψ_{output plane}^{(i)}^{2} d x d y .

I_{ij} = \int_{D_{j}} Ψ_{output plane}^{(i)}^{2} d x d y .

Loss_{eff} = - \frac{1}{n} i = 1 \sum n I_{ii} .

Loss_{eff} = - \frac{1}{n} i = 1 \sum n I_{ii} .

Loss_{xtalk} = \frac{1}{n} i = 1 \sum n (1 - \frac{I _{ii}}{\sum _{j} I _{ij}}) .

Loss_{xtalk} = \frac{1}{n} i = 1 \sum n (1 - \frac{I _{ii}}{\sum _{j} I _{ij}}) .

Loss = α Loss_{eff} + Loss_{xtalk} (1 - α) .

Loss = α Loss_{eff} + Loss_{xtalk} (1 - α) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Diffractive neural networks for mode-sorting with flexible detection regions

Kaden Bearne\authormark1

Alexander Duplinskiy\authormark1

Matthew J. Filipovich \authormark1

and A. I. Lvovsky\authormark1

\authormark1 Department of Physics, University of Oxford, Oxford, OX1 3PU, UK

\authormark*[email protected]

††journal: opticajournal††articletype: Research Article

{abstract*}

Mode-sorting is a procedure that decomposes a light field into a basis of transverse modes, directing each mode into a separate spatial location, allowing the constituent mode intensities to be measured simultaneously. We demonstrate a mode-sorter based on a diffractive optical neural network and show that it is advantageous to include the output detection regions into the trainable set of parameters of that network. This approach outperforms traditional mode-sorting methods, achieving higher efficiency for the same crosstalk levels.

1 Introduction

Light is an excellent medium for information as it is fast and has various degrees of freedom, such as spectral, temporal, polarization, etc. Encoding information in transverse light structure is particularly beneficial for free-space communication, as it is robust to losses, dispersion [1] and turbulence [2] and has a large information transfer capacity. While structuring beams arbitrarily is relatively straightforward using spatial light modulators (SLM) [3], demultiplexing a transverse light field into a given spatial mode basis, known as mode-sorting, has proven to be challenging. Mode-sorting has applications in communication [4], imaging [5], endoscopy [6] and optical machine learning [7]. Mode-sorting is also an integral part of the spatial demultiplexing passive superresolution technique [8, 9, 10, 11].

Initial mode-sorters relied on transformations performed by standard optical components. For example, this approach permits sorting Laguerre-Gaussian (LG) modes. These modes carry both a radial order and an orbital angular momentum (OAM). OAM is associated with a helical phase structure, which can be converted into a transverse phase ramp using the log-polar transformation. When subsequently focused by a lens, beams with different OAM will focus to different spatial locations. However, these locations overlap for neighbouring OAM eigenvalues, resulting in a $\sim 20$ % crosstalk [12]. Complementary to this, the radial order sorting is achievable using a fractional Fourier transform. The experimental demonstration sorting 3 different radial orders had a mean crosstalk of 15 % [13]. Combining these two approaches allows for full sorting of the LG modal basis [14], with a crosstalk of 15.3 % for a 10-mode sorter.

LG modes have a one-to-one relationship with the Hermite-Gaussian (HG) modes and can be converted using a pair of cylindrical lenses, enabling HG mode-sorting using an LG mode-sorter [15]. Both of these approaches rely on a fractional Fourier transformation, which limits the modal separation to two radial orders. It therefore becomes difficult to sort larger numbers of modes as these operations must be cascaded to perform the appropriate transforms. In addition, mode-sorting approaches based on a mode’s distinct mathematical properties lack universality: currently known techniques are limited to a few bases such as LG or HG.

A newer family of approaches involves multi-plane light converters (MPLCs), offering lower crosstalk for sorting the same number of modes. MPLCs are constructed using a series of programmable, spatially-variable phase plates separated by free-space propagation to implement a customizable transformation of a given input field. Originally proposed in 2010 [16], this approach has been extended to mode multiplexing [17] and demultiplexing (mode-sorting) where the phase plates are reflections from an SLM. Fountaine et al. sorted 210 HG modes into individual optical fibres using an MPLC with 7 phase plates with a crosstalk of 19% [18]. This approach has since been extended to other modal bases such as LG, OAM, Zernike, and even arbitrary speckles [19]. For example, using a 5-plate sorter, up to 10 Zernike modes were separated with a 9.4% crosstalk or up to 36 modes with a 31.2% crosstalk [19]. Additionally, the condition of orthogonality can be relaxed and overlapping quantum states can be sorted at the expense of introducing a loss [20]. MPLCs can be taken to the limit of a single plane, acting as a modal beam-splitter sorting several modes with modest crosstalk [21], or alternatively, the limit of infinite planes in the case of 3D graded-index volumes [22]. Fabricating a 3D structure, however, poses an additional challenge in a practical experiment. Much effort has gone into characterizing [23, 24] and optimizing [25] the performance of MPLCs. Optimizing the phase patterns is a complex task, traditionally performed using an adjoint optimization algorithm called wavefront matching method (WMM) [26]. Training via gradient descent using various cost functions has also been used in simulation for mode-sorting, but no significant advantage with respect to wavefront matching has been observed [19]. Other methods such as genetic algorithms have also been attempted [27].

In parallel, MPLCs have been explored by the optical machine learning community under the name of diffractive optical neural networks (DONNs). Since their initial introduction in 2018 [28], DONNs have been used for a variety of applications including deep learning, image recognition and reconstruction and communications [29, 30]. The physics of DONNs is identical to that of MPLCs, but DONNs are typically trained via backpropagation (gradient descent) on a digital twin. Hashimoto et al. argued that WMM can in fact be interpreted as a variant of gradient descent training, in which every optimization step increases the inner product between the forward and backpropagating modes, as we discuss in detail below [31]. However, the backpropagation method appears to streamline the training of the MPLC, as shown in computational works by Huang et al. [32] and Zhu et al. [33] and the experiment by Liu et al., in which OAM beams have been (de)multiplexed [34].

Here we demonstrate a novel training method for mode-sorters using neural networks and flexible detection regions. We treat the mode-sorter as a DONN and train the phase plates to direct each mode into a separate detection region via backpropagation. The output detection regions are a part of the DONN’s trainable set of parameters. As a result, we achieve mode-sorting with significantly higher efficiencies (probabilities for the input photon in each input mode to reach the appropriate detection region) compared to existing methods, while maintaining similar levels of crosstalk.

2 Phase Plate Training Methods

We begin by briefly describing existing methods of training MPLCs. An MPLC consists of phase plates separated by free-space propagation for the purpose of transforming the input electric field $\Psi_{\text{in}}(x,y)$ into a desired output field $\Psi_{\text{out}}(x,y)$ . WMM computes the “forward" optical field $\Psi_{p}^{\mathrm{for}}(x,y)$ as it propagates through each plate (indexed by $p$ ) from $\Psi_{\text{in}}$ at the input. Additionally, backward propagation $\Psi_{p}^{\mathrm{back}}(x,y)$ of the field, starting from $\Psi_{\text{out}}$ at the output, is calculated. At each plate, the phase shift $\varphi_{p}(x,y)$ imposed by that plate is updated according to the phase difference between the forward and backward propagating fields111 In earlier versions of WMM, the phase was instead updated by a constant learning rate $a$ in the direction defined by the sign of the phase difference between the forward and backward propagating fields [35]:

$\varphi_{p}\to\varphi_{p}-a\,\mathrm{sign}[\mathrm{Im}\{\Psi_{p}^{\mathrm{for}}[\Psi_{p}^{\mathrm{back}}]^{*}\}.$

[18, 36]

[TABLE]

The algorithm updates the phases according to Eq. (1) at each plate for each pass, iterating through until convergence. We note that this update rule can be interpreted as a more general form of the single plate Gerchberg-Saxton phase retrieval algorithm[37].

The WMM can be thought of as an algorithm attempting to maximize the overlap between the forward propagated output field and the desired output [35, 19]

[TABLE]

For the case of sorting spatially overlapping modes into spatially separate modes, the WMM iterations can be applied to each of the $n$ modes one-by-one, in which case the objective function becomes

[TABLE]

Importantly, updates (1) are not precisely collinear with steepest descent direction with respect to the loss functions (2) or (3), explaining better performance of gradient descent training via backpropagation[19, 33, 34].

Both WMM and the existing backpropagation implementations prescribe the exact mode of the output field. This is justified if the sorted modes need to be e.g. coupled into single-mode fibers. However, this is often not necessary, e.g. when the goal is only to determine the intensity of each mode in the input field. In this case, we need not prescribe the exact shape of the output modes, but only make sure that they land in different spatial regions of the output plane. This is the approach we take here. In addition to optimizing the phase plates in the traditional backpropagation manner, the training algorithm also chooses a set $\{D_{j}\}$ of non-overlapping regions in the detection plane into which each mode is sent. We find this innovation to significantly improve the performance of the mode-sorter.

The performance of the system can be described as a matrix $I_{ij}$ of the total intensity that is found in the given output detection region $D_{j}$ when the input field is prepared in mode $i$ :

[TABLE]

We strive to maximize the efficiency of correct classification

[TABLE]

while minimizing the modal crosstalk — the total relative intensity of wrongly classified light

[TABLE]

The final loss function is defined using hyperparameter $\alpha$ as a weighted combination of the modal crosstalk loss and the efficiency loss.

[TABLE]

By varying $\alpha$ , one can opt for levels of efficiency or crosstalk that best suit one’s needs. Additionally, terms can be added to the loss function to impose further constraints on the phase plates, e.g. smoothness or bit depth of $\varphi(x,y)$ .

3 Numerical Simulation and Experimental Demonstration

3.1 Setup and Mode-sorter Design

Our mode-sorter and the setup for its charactrization are shown in Fig. 1. Initially, the laser (Toptica DL100) at 786 nm is coupled into a fibre to clean the beam, collimated at the output by L1. A telescope (L2,L3) serves to expand the beam so that it covers the entire active surface of SLM1 (Meadowlark 1920x1152). This SLM displays the hologram to generate arbitrary HG modes in the first diffraction order. Subsequently, the desired mode is imaged onto the MPLC mode-sorter by a 4f system (L4,L5) with an iris at its focal plane to select the first order. The MPLC is SLM2 (Meadowlark 1920x1152) facing parallel to a mirror (M7) to facilitate multiple reflections from different areas of the the SLM, each acting as a phase plate. The mirror and the SLM are placed on translation and rotation stages to control the number of reflections as well as the distance between the plates. After the exit of the mode-sorter, a camera records the output intensity.

To combat unmodulated light due to the reflection off the front SLM surface, we work in the first diffraction order and take care to prevent the unmodulated light from all plates from entering the final measurement.

3.2 Numerical Comparison of Training Methods

We perform numerical simulations to train the phase plates and test their performance in various experimental situations. These simulations are performed in Python using TorchOptics [38], a package for simulating and training free-space optical systems using the framework of PyTorch, leveraging GPUs and CUDA. Crosstalk vs efficiency plots for sorting 25 modes (HG00 to HG44) with a 3-plate mode-sorter with the geometric parameters matching those of our experiment are shown in Fig. 2, where the efficiency is defined as the negative of the right-hand side of Eq. (5) and the crosstalk by Eq. (6). To determine the benefit of flexible detection, we train two different neural networks to find the phases, one with fixed detection and the other with flexible detection regions, using the loss function Eq. (7).

When the hyperparameter $\alpha$ is varied, the network finds solutions with different levels of crosstalk and efficiency represented by the two curves in Fig. 2(a). For $\alpha=1$ the system maximizes the efficiency and neglects the crosstalk, as represented by the final point (upper right) of the curves. Looking at the 25 mode 3-plane sorter, for the fixed detection regions, this yields a 30% efficiency at a 29% crosstalk. For flexible detection regions, the same crosstalk level (29%) is achieved at a 58.5% efficiency. Alternatively, for $\alpha\approx 0$ , shown in the inset to Fig. 2(a), the main goal of the network is preventing crosstalk. For all cases, we see the crosstalk levels to saturate below 1%, with further crosstalk improvements being minimal with significant cost in efficiency.

3.3 Experimental Results

Before we can analyze the mode-sorter performance, we need to characterize the HG modes generated via SLM1. We solve this task using the method of Bolduc et al. [3], using off-axis holography [39]. The reconstructed modes are shown in Fig. 3(a) along with their overlap matrix in Fig. 3(b). The modes were found to have a mean fidelity of 97.8 % and a mean modal overlap of 1.5 % for 25 modes. This represents the limit for the lowest achievable crosstalk after mode-sorting.

We demonstrate three versions of the mode-sorter: 1 plate for 4 modes, 2 plates for 4 and 9 modes, and 3 plates for 16 and 25 modes. The HG modes have a waist size of 29 pixels for an SLM pixel size of 9.2 $\mu$ m. Each phase plate measures $200\times 200$ pixels. The separation between the SLM and the mirror is 4.25 cm. This spacing allows enough propagation distance to separate the diffraction orders and make full use of each phase plate.

Figure 4 shows the results for the 3-plate sorter with 25 HG modes. The training finds the detection regions [Fig. 4(a)] and the phase plates [Fig. 4(e)]. The trained phase plates are symmetric, reflecting the inherent symmetry of the HG mode set. While training, slight numerical instabilities can lead to asymmetric plates. For this reason, we force the plates to remain symmetrical through appropriate parametrization. This makes little difference in simulated performance but greatly aids in experimental alignment.

We found the precision of the input light field location with respect to the phase plates to be critical, with the displacement by a fraction of an SLM pixel significantly affecting the performance. To address this, we fine-tune this location by adjusting M5-7 and SLM2.

Experimental imperfections cause the mode output fields to deviate from their theoretically predicted shapes. To address this, we re-optimize the detection regions accounting for the experimentally measured outputs, as illustrated in Fig. 5.

The mode-sorter performance results are summarized in Fig. 6. The mode-sorter with flexible detection regions outperforms its fixed detection counterpart. While both methods offer a trade-off between the crosstalk and efficiency, the fixed regions appear to have an earlier saturation point where the efficiency can no longer be improved at the expense of crosstalk.

We attribute most of the simulation-experiment gap to the various imperfections that come from using an SLM, including the SLM cavity effect and pixel crosstalk [40]. The resulting error accumulates with each phase plate. We believe that transmissive fabricated phase plates would not suffer from many of these issues [41].

Figure 7 shows the performance of a single-plate sorter with four modes. Remarkably, the flexible-detection mode-sorter performs in the experiment better than the fixed-region mode-sorter in simulation. This may be beneficial in many imaging applications such as phase-contrast [42] or dark-field [43] microscopy, where the goal is to prevent the light from the $\mathrm{HG}_{00}$ mode from contaminating other channels.

4 Conclusion

In conclusion, we have demonstrated that mode-sorters can be trained using a neural network approach, akin to DONNs, and that allowing for flexible detection regions outperforms the traditional MPLC fixed detection. Using the trade-off between crosstalk and efficiency, different applications (e.g. imaging or communication) can choose the desired levels for the respective tasks. For example, in a task requiring state discrimination such as optical communication using multiplexed OAM states [44], the crosstalk can be quite high and so efficiency should be favoured.

The design for the physical mode-sorter is simple and requires only an SLM and a mirror, allowing for easy reproducibility in most optics labs. For applications where the mode-sorter need not be reconfigurable, phase plates can be fabricated to avoid the spurious SLM effects and give a better performance.

\bmsection

Acknowledgments

The project is funded by EPSRC Standard Grant EP/Y020596/1 and EPSRC Impact Acceleration Account Award EP/X525777/1. KB is supported by the Clarendon Fund scholarship.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Z. Zhu, M. Janasik, A. Fyffe, et al. , “Compensation-free high-dimensional free-space optical communication using turbulence-resilient vector beams,” \Journal Title Nature Communications 12 (2021).
2[2] M. Krenn, R. Fickler, M. Fink, et al. , “Communication with spatially modulated light through turbulent air across vienna,” \Journal Title New Journal of Physics 16 , 113028 (2014).
3[3] E. Bolduc, N. Bent, E. Santamato, et al. , “Exact solution to simultaneous intensity and phase encryption with a single phase-only hologram,” \Journal Title Opt. Lett. 38 , 3546–3549 (2013).
4[4] B. J. Puttnam, G. Rademacher, and R. S. Luís, “Space-division multiplexing for optical fiber communications,” \Journal Title Optica 8 , 1186 (2021).
5[5] M. Tsang, R. Nair, and X.-M. Lu, “Quantum theory of superresolution for two incoherent optical point sources,” \Journal Title Phys. Rev. X 6 , 031033 (2016).
6[6] U. G. Būtaitė, H. Kupianskyi, T. Čižmár, and D. B. Phillips, “How to build the “optical inverse” of a multimode fibre,” \Journal Title Intelligent Computing (2022).
7[7] X. Fang, X. Hu, B. Li, et al. , “Orbital angular momentum-mediated machine learning for high-accuracy mode-feature encoding,” \Journal Title Light Sci. Appl. 13 (2024).
8[8] Z. Dutton, R. Kerviche, A. Ashok, and S. Guha, “Attaining the quantum limit of superresolution in imaging an object’s length via predetection spatial-mode sorting,” \Journal Title Phys. Rev. A 99 (2019).