Near-optimal decomposition of unitary matrices using phase masks and the discrete Fourier transform

Vincent Girouard; Nicol\'as Quesada

arXiv:2508.20010·physics.optics·December 23, 2025

Near-optimal decomposition of unitary matrices using phase masks and the discrete Fourier transform

Vincent Girouard, Nicol\'as Quesada

PDF

TL;DR

This paper presents a new analytical method for decomposing unitary matrices into phase masks and Fourier transforms, enabling efficient design of optical interferometers with reduced complexity and improved robustness.

Contribution

It introduces a near-optimal analytical decomposition of unitary matrices using phase masks and Fourier transforms, surpassing previous methods in efficiency.

Findings

01

Reduces the optical depth by a factor of 3 compared to previous methods.

02

Provides an analytical alternative to numerical optimization for designing interferometers.

03

Enables robust and efficient implementation of universal multiport interferometers.

Abstract

Universal multiport interferometers (UMIs) have emerged as a key tool for performing arbitrary linear transformations on optical modes, enabling precise control over the state of light in essential applications of classical and quantum information processing such as neural networks and boson sampling. While UMI architectures based on Mach-Zehnder interferometer networks are well established, alternative approaches that involve interleaving fixed multichannel mixing layers and phase masks have recently gained interest due to their high robustness to losses and fabrication errors. However, these approaches currently lack optimal analytical methods to compute design parameters with low optical depth. In this work, we introduce a constructive decomposition of unitary matrices using a sequence of $2 N + 5$ phase masks interleaved with $2 N + 4$ discrete Fourier transform matrices. This…

Figures4

Click any figure to enlarge with its caption.

Equations66

a_{j} \to b_{j} = k = 0 \sum N - 1 U_{j k} a_{k} .

a_{j} \to b_{j} = k = 0 \sum N - 1 U_{j k} a_{k} .

U = D^{(0)} F D^{(1)} F D^{(2)} \dots F D^{(L)},

U = D^{(0)} F D^{(1)} F D^{(2)} \dots F D^{(L)},

D^{(k)} = diag {e^{i ϕ_{0}^{(k)}}, e^{i ϕ_{1}^{(k)}}, \dots, e^{i ϕ_{N - 1}^{(k)}}},

D^{(k)} = diag {e^{i ϕ_{0}^{(k)}}, e^{i ϕ_{1}^{(k)}}, \dots, e^{i ϕ_{N - 1}^{(k)}}},

F_{j k} = \frac{1}{N} e^{- i 2 π j k / N} for i, j \in {0, 1, \dots, N - 1} .

F_{j k} = \frac{1}{N} e^{- i 2 π j k / N} for i, j \in {0, 1, \dots, N - 1} .

T_{m, n} (θ_{m}, θ_{n})

T_{m, n} (θ_{m}, θ_{n})

X = \frac{1}{2} (11 1 - 1) .

X = \frac{1}{2} (11 1 - 1) .

K_{j k} = ⎩ ⎨ ⎧ 1 k = 2 j, j \leq \frac{N}{2} - 1 1 k = 2 j + 1 - N, j > \frac{N}{2} - 1 0 otherwise .

K_{j k} = ⎩ ⎨ ⎧ 1 k = 2 j, j \leq \frac{N}{2} - 1 1 k = 2 j + 1 - N, j > \frac{N}{2} - 1 0 otherwise .

T_{odd}^{(k)} = X Θ^{(k)} X .

T_{odd}^{(k)} = X Θ^{(k)} X .

X = \frac{1}{2} (I I I - I),

X = \frac{1}{2} (I I I - I),

Θ^{(k)} = diag {

Θ^{(k)} = diag {

e^{i θ_{1}^{(k)}}, e^{i θ_{3}^{(k)}}, \dots, e^{i θ_{N - 1}^{(k)}}} .

P_{j k} = ⎩ ⎨ ⎧ 1 k = j + 1 (mod \frac{N}{2}), j \leq \frac{N}{2} - 1 1 k = j, j > \frac{N}{2} - 1 0 otherwise .

P_{j k} = ⎩ ⎨ ⎧ 1 k = j + 1 (mod \frac{N}{2}), j \leq \frac{N}{2} - 1 1 k = j, j > \frac{N}{2} - 1 0 otherwise .

T_{even}^{(k)} = P^{T} X Ω^{(k)} X P,

T_{even}^{(k)} = P^{T} X Ω^{(k)} X P,

Ω^{(k)} = diag {

Ω^{(k)} = diag {

e^{i θ_{2}^{(k)}}, e^{i θ_{4}^{(k)}}, \dots, e^{i θ_{0}^{(k)}}} .

U = D K^{T} k = 1 \prod N /2 T_{even}^{(2 k)} Ξ^{(2 k)} T_{odd}^{(2 k - 1)} K D^{'},

U = D K^{T} k = 1 \prod N /2 T_{even}^{(2 k)} Ξ^{(2 k)} T_{odd}^{(2 k - 1)} K D^{'},

T_{bi-layer} = T_{even} Ξ T_{odd} = P^{T} X Ω X P Ξ X Θ X,

T_{bi-layer} = T_{even} Ξ T_{odd} = P^{T} X Ω X P Ξ X Θ X,

P = X Λ X,

P = X Λ X,

H_{j j} = \frac{1}{2} [1 - (- 1)^{j}] + \frac{1}{2} [1 + (- 1)^{j}] e^{i 2 π j / N} .

H_{j j} = \frac{1}{2} [1 - (- 1)^{j}] + \frac{1}{2} [1 + (- 1)^{j}] e^{i 2 π j / N} .

T_{bi-layer} = X [Λ^{T} ΩΛ X Ξ X Θ] X,

T_{bi-layer} = X [Λ^{T} ΩΛ X Ξ X Θ] X,

Ξ = (I e^{- i ϵ} 0 0 I e^{i ϵ}),

Ξ = (I e^{- i ϵ} 0 0 I e^{i ϵ}),

X Ξ X = (I cos ϵ - I i sin ϵ - I i sin ϵ I cos ϵ) .

X Ξ X = (I cos ϵ - I i sin ϵ - I i sin ϵ I cos ϵ) .

C_{j j} = e^{i (- 1)^{j + 1} ϵ} .

C_{j j} = e^{i (- 1)^{j + 1} ϵ} .

T_{bi-layer} = X [F H F^{†} Ω F^{†} H C F Θ] X .

T_{bi-layer} = X [F H F^{†} Ω F^{†} H C F Θ] X .

Π_{j k} = ⎩ ⎨ ⎧ 1 j = k = 0 1 j = N - k 0 otherwise .

Π_{j k} = ⎩ ⎨ ⎧ 1 j = k = 0 1 j = N - k 0 otherwise .

T_{bi-layer} = X [F H F p (Ω) F H C F Θ] X,

T_{bi-layer} = X [F H F p (Ω) F H C F Θ] X,

U =

U =

\times

\times

X = G Y G,

X = G Y G,

G = (I 0 0 i I)

G = (I 0 0 i I)

Y = F^{†} E F,

Y = F^{†} E F,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Near-optimal decomposition of unitary matrices using

phase masks and the discrete Fourier transform

Vincent Girouard

Département de génie physique, École polytechnique de Montréal, Montréal, QC, H3T 1J4, Canada

[email protected]

Nicolás Quesada

Département de génie physique, École polytechnique de Montréal, Montréal, QC, H3T 1J4, Canada

[email protected]

Abstract

Universal multiport interferometers (UMIs) have emerged as a key tool for performing arbitrary linear transformations on optical modes, enabling precise control over the state of light in essential applications of classical and quantum information processing such as neural networks and boson sampling. While UMI architectures based on Mach-Zehnder interferometer networks are well established, alternative approaches that involve interleaving fixed multichannel mixing layers and phase masks have recently gained interest due to their high robustness to losses and fabrication errors. However, these approaches currently lack optimal analytical methods to compute design parameters with low optical depth. In this work, we introduce a constructive decomposition of unitary matrices using a sequence of $2N+5$ phase masks interleaved with $2N+4$ discrete Fourier transform matrices. This decomposition can be leveraged to design universal interferometers based on phase masks and multimode interference couplers, implementing a discrete Fourier transform, offering an analytical alternative to conventional numerical optimization-based designs and reducing by a factor of 3 the previous best known analytical methods.

††journal: opticajournal

1 Introduction

Linear optics is a powerful framework for classical and quantum information processing, owing to the high transmission speed, low power consumption, and low decoherence of photons when used as information carriers. In addition, integrated photonics provides a mature and readily available platform for realizing these technologies in a compact, stable, and scalable manner [1]. A central component of such systems is the universal multiport interferometer (UMI), a reconfigurable optical device capable of performing arbitrary linear transformations on multiple optical channels [2, 3]. UMIs play a crucial role in many key applications of quantum information processing such as boson sampling [4, 5, 6] and its variants [7, 8, 9], linear optical quantum computing [10, 11, 12, 13], quantum simulations [14, 15, 16], quantum neural networks [17, 18], and continuous-variable quantum information processing [19, 20, 21, 22, 23]. They are also crucial in classical applications, including fiber optic communication [24, 25], sensing [26], signal processing [27, 28], optical neural networks [29], and imaging [30, 31]. The development of compact, loss- and error-tolerant UMI designs is therefore critical for advancing those technologies.

The decomposition of unitary matrices into factors implementable by optical components plays a central role in the design of UMIs. It is well established that any unitary transformation can be implemented using specific arrangements of two-mode components [32]. In a seminal work, Reck et al. [33] showed that arbitrary linear transformations can be realized using a triangular network of Mach-Zehnder interferometers (MZI) and single-mode phase shifters, as shown in Fig. 1(a). This architecture enabled the realization of high-fidelity integrated optical processors for up to $N=6$ modes [34]. This scheme was later improved by Clements et al. [35], who used the same unit cell in a rectangular mesh to achieve a more compact and loss-tolerant design, as depicted in Fig. 1(b). The Clements et al. design has since been demonstrated for systems with up to $N=20$ modes [36, 37, 38]. Other similar configurations have also been proposed to further enhance error and loss tolerance [39, 40, 41], computational cost [42], or optical depth [43]. However, scaling such architectures to a large number of modes remains a challenge due to the poor robustness of MZI networks to fabrication imperfections. Even small deviations in beam splitter reflectivities can significantly compromise universality of large UMIs [44], imposing strict requirements on component precision and increasing fabrication complexity. To mitigate these effects, various strategies are typically employed [45, 44, 46, 47, 48, 49, 50, 51, 52, 53]. Some approaches introduce redundant layers or mesh non-localities [45, 44, 46] and rely on global optimization to set MZI parameters. However, such methods increase the optical depth of the device, require pre-characterization of component errors, and are computationally expensive. Other strategies use self-configuring networks [47, 48, 49, 50, 51, 52], requiring additional sources and power detectors inside or outside the circuit, which further increases fabrication complexity.

To improve tolerance to losses and component imperfections, several approaches have explored alternative designs based on multichannel components [54, 55, 56, 57]. Among them, architectures based on interleaving a fixed multichannel mixing layer with phase masks [58, 59, 60, 61, 62, 63, 64] [see Fig. 1(c)] are of particular relevance because of their high robustness against perturbations in the mixing layer [61, 65, 60]. Parameter counting gives a lower bound of at least $N+1$ phase masks needed to represent an arbitrary unitary. This is consistent with numerical evidence suggesting that interleaving $N+1$ [61, 63] or $N+2$ [64] layers of phase masks with almost any dense unitary matrix could result in a universal architecture [66]. Notable examples of mixing layers include the discrete Fourier transform (DFT), which can be implemented using a multimode interference coupler [62, 67], and the discrete fractional Fourier transform (DFrFT), which can be realized using multiport waveguide arrays [68]. However, unlike designs based on MZI networks, those built from multichannel mixing layers lack optimal analytical algorithms for computing the phase mask parameters and thus almost always rely on numerical optimization procedures [63, 61, 64, 66]. As a result, their universality cannot be guaranteed, and the high computational cost associated with computing parameters prevents their scalability to larger systems, highlighting the need for optimal analytic designs.

Significant progress has already been made toward an analytical decomposition of linear transformations into a sequence of phase masks and DFT operations. Tools from group theory have been used to provide an existence proof that all unitary operations can be constructed from alternating DFTs and phase masks [69], although the exact number of masks required in the sequence is not specified. Later works in matrix analysis have shown that any square matrix can be decomposed into a product of diagonal and circulant matrices [70] and provided a constructive proof requiring at most $2N-1$ factors for an $N\times N$ matrix [71]. This result is particularly relevant since circulant matrices can be diagonalized by the DFT matrix [72]. However, the algorithm does not guarantee that the diagonal matrices are unitary, which prevents them from being implemented as passive phase masks. Using the Sinkhorn normal form for unitary matrices, Idel and Wolf provided a non-constructive proof that any $N\times N$ unitary matrix can be decomposed into $2N-1$ diagonal unitaries interleaved with partial DFTs [73]. However, their work does not provide a procedure for explicitly computing the phase mask parameters.

The most promising approach toward the goal of decomposing unitary linear transformations into a sequence of phase masks and DFT operations was developed by López Pastor et al. [62], who introduced a constructive proof showing that any $N\times N$ unitary matrix can be decomposed into a product of $6N+1$ phase masks interleaved with DFT matrices. Their method first applies the algorithm of Clements et al. [35] to express the unitary as a rectangular network of beam splitters and phase shifters, and then interprets each layer as a multimode interaction, which is subsequently decomposed into a product of circulant and diagonal matrices. However, this method generates a sequence with a significant phase mask overhead, as numerical evidence and simple parameter counting suggest that as few as $N+1$ layers could suffice [61, 66].

In this work, we introduce a new analytical decomposition of unitary matrices in terms of DFTs and phase masks, which yields a universal multiport interferometer with only a third of the optical depth compared to previous analytical designs [62]. We also provide an open-source Python implementation of the algorithm. Our approach builds on the framework of López Pastor et al. [62], but begins with the interferometer of Bell and Walmsley [43], which uses a symmetric MZI (sMZI) as unit cell instead of the asymmetric MZI used in the Clements et al. design [35]. This building block is not only more compact but also more symmetric, which results in significant simplifications in the sequence of phase masks. We also introduce a circuit identity to increase the symmetry of the layers, allowing additional compression of the overall sequence. This reduction in the number of layers could considerably lower optical losses, fabrication costs, and size of UMIs based on this architecture, thereby facilitating their scalability to a larger number of modes.

2 Decomposition Method

A lossless linear transformation acting on $N$ optical modes can be represented by a $N\times N$ unitary matrix $U$ . Such a matrix transforms the annihilation operators (or the classical amplitudes) of the input modes according to

[TABLE]

Arbitrary unitary matrices of size $N\times N$ are characterized by $N^{2}$ real degrees of freedom [74]. Hence, any architecture aiming to implement arbitrary transformations needs at least the same amount of controllable parameters. This constraint puts a lower bound on the number of phase masks required in the design, as at least $N+1$ masks are necessary to ensure the complete parametrization of the unitary. Without loss of generality, in what follows we assume that $N$ is even.

We want to find an analytical decomposition of $U$ such that

[TABLE]

where $L$ should be as small as possible, $D^{(k)}$ are diagonal phase masks of the form

[TABLE]

and $F$ is the DFT matrix [75], whose elements are given by

[TABLE]

An important property of the DFT matrix, which will prove to be particularly useful, is that it diagonalizes circulant matrices [72]. Circulant matrices have each row given by a cyclic right shift of the previous one. If $C$ is circulant, then there exists a diagonal matrix $D$ such that $C=F^{\dagger}DF$ . The problem of finding a decomposition of $U$ as in (2) is therefore equivalent to finding a decomposition of $U$ in terms of diagonal and circulant unitary matrices.

Bell and Walmsley [43] demonstrated that any $N\times N$ unitary matrix can be decomposed into a rectangular mesh of $N(N-1)/2$ symmetric MZI (sMZI) with additional phase shifters located along the edges of the mesh. An example of this architecture is shown in Fig. 2(a) for $N=6$ modes. Their design is characterized by the use of a symmetric non-universal unit cell, which is more compact than those employed in [33, 35], resulting in an interferometer with reduced optical depth. The sMZI used in this design is made from two balanced beam splitters and two internal phase shifters, as depicted in Fig. 2(b). It acts on two consecutive modes $m$ and $n$ according to the transfer matrix $T_{m,n}(\theta_{m},\theta_{n})$ given by

[TABLE]

where $\Theta=\text{diag}\left\{e^{i\theta_{m}},e^{i\theta_{n}}\right\}$ , $\Sigma=\left(\theta_{m}+\theta_{n}\right)/2$ , $\delta=\left(\theta_{m}-\theta_{n}\right)/2$ , and $X$ corresponds to the 50:50 beam splitter matrix given by

[TABLE]

When the unit cell operates within a multimode system ( $N>2$ ), the resulting transfer matrix is the $2\times 2$ matrix defined in (5) embedded into the $N\times N$ identity matrix at modes $m$ and $n$ .

We will interpret each sMZI layer in the interferometer from Fig. 2 as a multimode interaction. To ensure that each layer acts uniformly on the entire mode space, we will impose periodic boundary conditions on the mode indices so that interactions between channels $j$ and $j+1$ (mod $N$ ) will occur in even-numbered layers, as shown in Fig. 3(a). The fictitious interaction between modes 0 and $N-1$ happening at the boundary is set to the bar state (identity matrix), but is included for completeness. We will also follow the procedure of [62] and relabel the channels of the interferometer such that $\left\{0,1,2,\dots,N-1\right\}\rightarrow\left\{0,\frac{N}{2},1,\frac{N}{2}+1,\dots,N-1\right\}$ . This transformation can be achieved using the permutation matrix $K$ , defined by

[TABLE]

The effect of this permutation on the channel labels can be seen in Fig. 3(a) for $N=6$ channels.

With this relabeling, odd sMZI layers exhibit a simple structure, where each channel $j$ of the second half of the system (i.e., $j\geq N/2$ ) interacts with channel $j-N/2$ , as highlighted in Fig. 3(b). This structure allows the transformation performed by an odd layer $k$ to be described by

[TABLE]

In (8), $X$ represents a layer of $N/2$ beam splitters acting between pairs of channels $j$ ( $j\geq N/2$ ) and $j-N/2$ . The corresponding transformation is given by the block matrix

[TABLE]

where $I$ is the $\frac{N}{2}\times\frac{N}{2}$ identity matrix. $\Theta^{(k)}$ is a diagonal matrix that represents a layer of phase shifters applied to all modes and is given by

[TABLE]

In even layers, all unit cells are shifted by one, resulting in interactions between each channel $j$ ( $j\geq N/2$ ) and $j-N/2+1$ [see Fig. 3(b)]. The transformation can be described by the same sequence as (8), provided that a channel permutation is applied beforehand. As detailed in [62], a cyclic shift must be applied to the first half of the modes to map each channel $j-N/2+1$ (mod $N/2$ ) ( $j\geq N/2$ ) to $j-N/2$ , thereby restoring the structure observed in odd layers thanks to the invariance of sMZIs under channel exchange. This transformation can be implemented using the permutation matrix $P$ , defined as

[TABLE]

This permutation cyclically shifts by one the first half of the modes while leaving the second half unchanged. We can thus express the transformation performed in even layers by

[TABLE]

where

[TABLE]

The unitary matrix $U$ can therefore be expressed in an intermediate form as

[TABLE]

where $T_{\text{odd}}$ and $T_{\text{even}}$ are given by (8) and (12). The matrices $D^{\prime}$ and $D$ are the diagonal phase masks applied at the input and output of the circuit [see Fig. 2], while $\Xi^{(k)}=\text{diag}\left\{1,1,\dots,e^{i\xi^{(k)}}\right\}$ corresponds to the edge phase shifters present in the Bell and Walmsley architecture [43]. In the decomposition (14), each recurring sMZI bi-layer takes the form

[TABLE]

where the layer indices have been omitted for clarity.

López Pastor et al. [62] showed that the matrix $P$ in (11) can be decomposed as

[TABLE]

where $\Lambda=F^{\dagger}HF$ is circulant, and $H$ is a diagonal matrix with entries

[TABLE]

By substituting (16) into (15), and using the fact that $XX=I$ , we obtain

[TABLE]

where the square brackets indicate that the two $X$ matrices at the edges will cancel out for all but the first and last bi-layers.

The expression (18) can be further simplified by using circuit identities to relocate the edge phases within the interferometer. In the Bell and Walmsley design [43], edge phases can be moved to different modes by adjusting the global phase of the adjacent sMZI, as illustrated in Figs. 4(a) and 4(b). This principle can be leveraged to distribute evenly the edge phases across all the modes in the interferometer, as explained in Fig. 4(c). Thus, given an initial edge phase shift of angle $\xi$ , the matrix $\Xi=\text{diag}\left\{1,1,\dots,e^{i\xi}\right\}$ becomes

[TABLE]

where $\epsilon=\xi/N$ . Using this new symmetric form for $\Xi$ , it can be seen that the product $X\Xi X$ in (18) becomes circulant, i.e.,

[TABLE]

The matrix in (20) can therefore be diagonalized by DFTs into the $C$ matrix, whose entries are given by

[TABLE]

After these simplifications, all circulant matrices in (18) can be diagonalized, which results in

[TABLE]

An important property of the DFT matrix is that $F^{4}=I$ , which means that $F^{\dagger}=F^{3}=\Pi F=F\Pi$ where $\Pi$ is a permutation matrix with elements given by

[TABLE]

It follows that all daggers can be removed from (22) to get

[TABLE]

where $p:U\rightarrow\Pi U\Pi$ . Using (24) in expression (14) for $U$ , we get

[TABLE]

The final step is to decompose the two remaining $X$ matrices in (25). López Pastor et al. [62] showed that $X$ can be written as

[TABLE]

where

[TABLE]

is diagonal, and

[TABLE]

is circulant, with the matrix $E$ given by

[TABLE]

(26) and (28) can thus be used in (25) to obtain the final expression for $U$ :

[TABLE]

where $k:U\rightarrow KUK^{T}$ . Decomposing $U_{p}=K^{T}\Pi U\Pi K$ yields an expression for $U$ entirely in terms of DFT and diagonal matrices. The decomposition (30) consists of a sequence of $2N+5$ phase masks and $2N+4$ DFT matrices for an $N\times N$ unitary matrix. Only the matrices with upper indices depend on $U$ , while the others are fixed. Although the proposed derivation only works in the case of unitary matrices of even dimension, odd unitaries can always be embedded into a larger even-dimensional identity matrix at the cost of introducing two additional phase masks in the final sequence.

3 Discussion

The universal design proposed in this work represents a significant improvement in mask overhead compared to the previous analytical approach, which required $6N+1$ masks [62]. The final sequence obtained in (30) consists of a product of $2N+5$ phase masks interleaved with $2N+4$ DFT layers, representing a 66% reduction in optical depth for large $N$ . Our main contribution over this previous design lies in the use of a multiport interferometer built from symmetric MZIs. Thanks to their more compact layout, all phase shifters within a given sMZI layer are aligned in parallel, enabling highly parametrized phase masks and helping reduce the overall circuit depth. By applying circuit identities, localized edge phases were uniformly distributed across all modes, leading to more convenient and symmetric matrix representations. Interestingly, by relocating a single component in the unit cell — specifically the external phase shifter, which accounts for half of the length taken by phase shifters in the Clements et al [35] interferometer — we are able to divide the overall depth by a factor of 3 when using DFT layers and phase masks.

This decomposition has the potential to enable the design of UMIs that are more robust, compact, cheap, and that exhibit lower propagation losses. These improvements could facilitate the scaling of photonic processors to a larger number of modes, allowing more complex circuits to be integrated on a single photonic chip. Such advancements would benefit the development of both classical and quantum photonic technologies. Although a full analysis of imperfections in our decomposition lies outside the scope of this work, we would like to highlight that it is highly symmetric, and thus if each element, DFT and phase mask, has identical transmission properties then losses will be balanced for the whole interferometer, a highly desirable feature. The analytic method that was developed also provides a fast and exact way to compute the mask parameters without relying on numerical optimization, thereby enabling faster and more energy-efficient programming of UMIs, as the time complexity of this approach matches that of the Clements et al. or Bell and Walmsley schemes [35, 43]. Additionally, this design is also resilient to noise and fabrication errors since perturbations in the mixing layer do not compromise universality [65], eliminating the need for redundant layers. Moreover, we expect that an analytical framework could help develop faster and more accurate error correction strategies in the presence of fabrication imperfections.

We note that the derivation introduced in this work gives rise to a continuous family of new analytical decompositions of a unitary matrix as sequences of mixing layers and phase masks. Specifically, many complex Hadamard matrices that are equivalent to the $N\times N$ DFT matrix can serve as valid mixing layers. Given a permutation matrix $P$ and two diagonal unitaries $\Lambda_{1}$ and $\Lambda_{2}$ , we can define a generalized mixing layer as $\tilde{F}=\Lambda_{1}PFP^{T}\Lambda_{2}$ . By appropriately adjusting the phase masks such that $\tilde{D}^{(k)}=\Lambda_{2}^{*}PD^{(k)}P^{T}\Lambda_{1}^{*}$ , they will remain diagonal and the structure of the decomposition will be preserved. The method could therefore be used with other physical implementations of mixing layers that are not represented by the DFT matrix.

While this decomposition is of particular relevance for photonics, with applications ranging from boson sampling and quantum simulations to classical signal processing, it could also be applied to any kind of linear wave, enabling programmable wave evolution in a broader context. More generally, our method introduces a simple and constructive matrix factorization relying only on diagonal unitary matrices and discrete Fourier transforms, or equivalently, on diagonal and circulant unitary matrices. Despite these promising advances, the number of phase masks required in this approach remains roughly twice the expected lower bound of $N+1$ that can be observed as a phase transition in many numerical experiments. This means that our design is still suboptimal in the number of layers. Achieving this lower bound remains an open challenge, but we expect that our work represents a significant step toward that goal.

**Disclosures — ** The authors declare no conflict of interest.

Software Implementation — A Python implementation of our algorithm, as well as the ones of Clements et al. [35], Bell and Walmsley [43] and López Pastor, Lundeen and Marquardt [62] is available in the Unitary-Decomp package [76]. The package can also perform gradient-based numerical decompositions of unitaries using JAX [77].

**Acknowledgments — ** The authors thank the Ministère de l’Économie et de l’Innovation du Québec and the Natural Sciences and Engineering Research Council of Canada (Quantum Consortium Program (Quantamole)) for their financial support and M. Trudeau for insightful discussions.

Bibliography77

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Wang, F. Sciarrino, A. Laing, and M. G. Thompson, “Integrated photonic quantum technologies,” \Journal Title Nature Photonics 14 , 273–284 (2020).
2[2] N. C. Harris, J. Carolan, D. Bunandar, et al. , “Linear programmable nanophotonic processors,” \Journal Title Optica 5 , 1623–1631 (2018).
3[3] W. Bogaerts, D. Pérez, J. Capmany, et al. , “Programmable photonic circuits,” \Journal Title Nature 586 , 207–216 (2020).
4[4] S. Aaronson and A. Arkhipov, “The computational complexity of linear optics,” in Proceedings of the forty-third annual ACM symposium on Theory of computing, (2011), pp. 333–342.
5[5] J. B. Spring, B. J. Metcalf, P. C. Humphreys, et al. , “Boson sampling on a photonic chip,” \Journal Title Science 339 , 798–801 (2013).
6[6] A. Crespi, R. Osellame, R. Ramponi, et al. , “Integrated multimode interferometers with arbitrary designs for photonic boson sampling,” \Journal Title Nature Photonics 7 , 545–549 (2013).
7[7] C. S. Hamilton, R. Kruse, L. Sansoni, et al. , “Gaussian boson sampling,” \Journal Title Physical Review Letters 119 , 170501 (2017).
8[8] A. Deshpande, A. Mehta, T. Vincent, et al. , “Quantum computational advantage via high-dimensional gaussian boson sampling,” \Journal Title Science Advances 8 , eabi 7894 (2022).