Block Encoding of Sparse Matrices via Coherent Permutation
Abhishek Setty

TL;DR
This paper presents a unified framework for efficient block encoding of sparse matrices in quantum computing, addressing key implementation challenges and enabling hardware-friendly quantum circuits.
Contribution
It introduces a systematic approach linking combinatorial optimization and coherent permutation operators to improve gate efficiency and connectivity in quantum matrix encoding.
Findings
Reduces control overhead in quantum block encoding.
Achieves structured amplitude reordering with coherent permutations.
Bridges theoretical encoding methods with hardware-efficient circuits.
Abstract
Block encoding of sparse matrices underpins powerful quantum algorithms such as quantum singular value transformation, Hamiltonian simulation, and quantum linear solvers, yet its efficient gate-level realization for general sparse matrices remains a major challenge. We introduce a unified framework that addresses key obstacles including the overhead of multi-controlled X (MCX) gates, amplitude reordering, and hardware connectivity, enabling simplified block encoding constructions with explicit gate-level implementations. Central to our approach is a connection to combinatorial optimization, which enables systematic assignment of control qubits to satisfy nearest-neighbor connectivity constraints, along with coherent permutation operators that preserve superposition while enabling structured amplitude reordering. We demonstrate our methods on structured sparse matrices, achieving…
| Block encoding operations |
|---|
| Common shift operators | |
|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Block Encoding of Sparse Matrices via Coherent Permutation
Abhishek Setty
Forschungszentrum Jülich, Institute of Quantum Control (PGI-8), D-52425 Jülich, Germany
Institute for Theoretical Physics, University of Cologne, D-50937 Cologne, Germany
Abstract
Block encoding of sparse matrices underpins powerful quantum algorithms such as quantum singular value transformation, Hamiltonian simulation, and quantum linear solvers, yet its efficient gate-level realization for general sparse matrices remains a major challenge. We introduce a unified framework that addresses key obstacles including the overhead of multi-controlled X (MCX) gates, amplitude reordering, and hardware connectivity, enabling simplified block encoding constructions with explicit gate-level implementations. Central to our approach is a connection to combinatorial optimization, which enables systematic assignment of control qubits to satisfy nearest-neighbor connectivity constraints, along with coherent permutation operators that preserve superposition while enabling structured amplitude reordering. We demonstrate our methods on structured sparse matrices, achieving systematic reductions in control overhead and circuit depth. Our framework bridges the gap between theoretical formulations and hardware-efficient quantum circuit implementations.
keywords:
Block encoding , Quantum circuits , Quantum linear algebra , Combinatorial optimization
Contents
1 Introduction
Block encoding has emerged as a central primitive in modern quantum algorithms, providing a systematic way to embed a matrix into a unitary operator and thereby enabling polynomial transformation of operators via Quantum Singular Value Transformation (QSVT) Gilyén et al. [2019]. The idea was first used implicitly in early breakthroughs such as the Harrow-Hassidim-Lloyd (HHL) algorithm for solving linear systems of equations Harrow et al. [2009] and Hamiltonian simulation techniques Berry et al. [2015], Childs et al. [2017], Childs and Wiebe [2012], and was later formalized by Gilyén et al. Gilyén et al. [2019]. Since then, block encoding has become indispensible in a wide range of domains including quantum linear algebra, optimization, machine learning and quantum chemistry Brandao and Svore [2017], Van Apeldoorn et al. [2017], Babbush et al. [2018]. Succinctly, block encoding is the process of embedding a given (possibly non-unitary) matrix into a larger unitary operator as,
[TABLE]
where is a subnormalization factor ensuring , denotes inconsequential blocks, and is the spectral norm. The factor and the presence of blocks guarantee that a unitary exists.
Understanding the importance of block encoding has led to substantial research efforts to optimize its construction for different matrix classes. For arbitrary dense matrices, resource requirements have been well studied Clader et al. [2023], Chakraborty et al. [2018]. Approximate block encodings using single- and two-qubit gates have been developed through the FABLE method Camps and Van Beeumen [2022], Kuklinski and Rempfer [2024], and subsequent improvements using demultiplexor operations have been proposed Li et al. [2025]. For sparse matrices, block encodings have typically been formulated in terms of black-box oracles Gilyén et al. [2019], but these works often omit explicit circuit-level implementations. More detailed realizations have been provided for structured sparsity Camps et al. [2024]. Since the subnormalization factor directly affects amplitude scaling in block encoding and consequently increases circuit depth in certain algorithms, recent works have sought to reduce it. Sünderhauf et al. Sünderhauf et al. [2024] proposed schemes for matrices with arithmetic structure and PREP/UNPREP operators inspired by the Linear Combinations of Unitaries (LCU) method Childs and Wiebe [2012], while Yang et al. Yang et al. [2024] introduced a dictionary-based protocol with improved subnormalization factor. Despite these advances, efficient, and fully explicit constructions for block encodings of sparse matrices remain largely unexplored.
In this work, we introduce a quantum data manipulation framework within block encoding that provides finer control over amplitude placement while reducing the control complexity of multi-controlled X (MCX) gates. Our approach shows that arbitrary control configurations can be systematically transformed into structured forms that admit MCX compression, enabling more efficient circuit implementations. We establish a novel connection between coherent amplitude permutation and combinatorial optimization, which allows us to determine optimal assignments of control qubits. This is particularly relevant for quantum hardware with nearest-neighbor connectivity, where long-distance interactions increase noise and circuit depth Linke et al. [2017], Beals et al. [2013], Kutin et al. [2007]. By optimizing control placement, our framework simultaneously simplifies MCX structures and improves hardware compatibility. More broadly, our method integrates quantum circuit design with classical optimization techniques, advancing efficient quantum circuit mapping and compilation Shende et al. [2005], Amy et al. [2013], Cowtan et al. [2019], Murali et al. [2019]. We demonstrate these ideas on structured sparse matrices, illustrating how the proposed framework translates theoretical constructions into practical gate-level implementations suitable for quantum algorithms.
2 Notations and Preprocessing
Assuming familiarity with standard conventions in the quantum computing literature, we establish following notations and conventions:
- •
For an matrix, the column is denoted by , where . Its binary representation is given by,
[TABLE]
where and is the number of qubits.
- •
Qubits in a circuit diagram are ordered increasingly from top to bottom, as illustrated for the three-qubit circuit in Fig. 1(a). The binary state is mapped to the quantum register , such that the highest-index qubit corresponds to , and the lowest-index qubit corresponds to Fig. 1(b).
- •
The Hamming distance Hamming [1950], Li et al. [2022] between two binary strings and , where , is defined as
[TABLE]
where denotes the XOR operation. For instance, .
- •
Multi-controlled NOT gates are denoted by , where denote control qubits, denote control state Eq. 2, and denote NOT gate applied on qubit.
For data preprocessing in block encoding, consider a sparse complex matrix with row index given by and column index given by . It is important to note that this method embeds each data element only once in a row/column. Therefore, we collect unique data elements along each diagonal . For each diagonal:
[TABLE]
where the cardinality of each set is given by . Here refers to diagonals in lower diagonal and refers to diagonals in upper diagonal of a square matrix. The set of rows where each unique data element is repeated along the diagonal is given by , where . The corresponding columns can be directly mapped as . In general, the sets and can be any order, therefore, we choose an order and create a data vector containing only non-zero magnitudes such as,
[TABLE]
By construction contains only positive real values. Using the same order as in Eq. 5, we create a sign vector such as,
[TABLE]
where denotes the sign function and denote imaginary component. The dimension of the data vector is denoted by . Note that after creating the data vector , the diagonal of data element is denoted by and row set is given by .
3 Block Encoding
In this section, we elaborate the PREP/UNPREP-based block encoding Sünderhauf et al. [2024], Yang et al. [2024] and present a framework for constructing the corresponding quantum circuits. Consider the following quantum oracles,
State Preparation: The oracles PREP and UNPREP together embed into the amplitudes of a quantum state. Here denote element wise multiplication between two vectors. Note that we can pad the vector with zeros to fill basis states, if needed. 2. 2.
Index Mapping: For a data element in diagonal , rows and columns , the index mapping oracle is composed of two oracles such as shift and delete . The oracle performs injective mapping between columns to rows . The oracle performs deletion of elements from rows .
With these oracles in place, the block encoding scheme is formulated as described in Theorem 1.
Theorem 1**.**
Let be a matrix that has data collected as , and . If there exists shift oracle such that
[TABLE]
and delete oracle such that,
[TABLE]
and two state preparation oracles PREP and UNPREP such that
[TABLE]
[TABLE]
then the unitary, , as shown in Fig. 2, can block encode with the subnormalization .
Proof: To recover the matrix from its block encoding, the flag qubits (i.e., the data and delete qubits) are initialized and postselected in the state . This is achieved by initializing the bottom register with and postselecting (or measuring) the outcome , as follows:
[TABLE]
where denotes a kronecker delta function . In case of complex entries, we add its real and imaginary components within the block encoded matrix as . An illustration is shown in Eqs. 16 and 17 and an example of block encoding tridiagonal complex matrix is presented in Section 9.1.
3.1 State Preparation Oracle
The task of state preparation oracles PREP/UNPREP is to embed data into the amplitudes of a quantum state. Möttönen et al. Mottonen et al. [2004], Möttönen et al. [2004] introduced a state preparation method based on uniformly controlled rotation gates, which can be decomposed into either multi-controlled rotations or sequences of single- and two-qubit gates. This construction leverages classical preprocessing—such as Gray code ordering—to structure the circuit efficiently, reducing the number of controlled rotations required. Later, Iten et al. Iten et al. [2016] proposed a hardware-oriented approach that decomposes arbitrary isometries exactly into single-qubit and CNOT gates via a recursive synthesis procedure based on the cosine-sine decomposition. Both methods are exact and require no ancilla qubits, but generally scale exponentially in gate count and depth for arbitrary state preparation. More recent work has explored depth-optimized schemes Zhang et al. [2022], achieving circuit depths of for an -qubit state or for -sparse states. These improvements, however, often come at the cost of introducing additional ancilla qubits, which in some cases can scale exponentially requiring ancilla qubits.
3.2 Index Mapping Oracle
Without the index mapping oracles and in Fig. 2, the block-encoded matrix reduces to a diagonal form (refer Theorem 1),
[TABLE]
The oracle Camps et al. [2024] redistributes the data elements from the main diagonal to a target diagonal of offset by performing conditional shifts of the column index register. Specifically, entries are mapped to , corresponding to a horizontally shift by columns. We define left shift and right shift . To construct the circuit for shift oracle, let us define absolute value of diagonal in binary form:
[TABLE]
For an integer in binary , we define a set of 1-bit positions as . Then for , we define a left shift oracle for data element shifting left by columns as:
[TABLE]
Note that the order of operators are written from left to right and the right most operator is applied first in the circuit. Similarly, for , we define a right shift oracle for data element shifting right by columns as:
[TABLE]
Using these two shifts Eqs. 13 and 14, the shift oracle can be generalized as:
[TABLE]
This corresponds to decomposing the shift by into a sequence of conditional shifts by powers of two. To visualize the circuit, we consider three matrix qubits and represent left shift (d=1) of data element by one column (see Eq. 13) in Fig. 3(a). Similarly, the right shift (d=-1) of data element by one column (see Eq. 14) is shown in Fig. 3(b). Furthermore, the representation of shifting left the and data elements by two and four columns, respectively is shown in Fig. 3(c).
To visualize the block encoded matrix after shifting, we represent an example as follows. Consider the data vector , sign vector and block encoding matrix to be requiring three matrix qubits. If we apply the operator (see Eqs. 13 and 15 and Fig. 3(a)), then the block encoded matrix will be given as
[TABLE]
Note that complex entries are block encoded by adding its non-zero real and imaginary components (see Eqs. 5, 6 and 1). Similarly, if we apply the operator (see Eqs. 14 and 15 and Fig. 3(b)), then the block encoded matrix will be given as
[TABLE]
The delete oracle Sünderhauf et al. [2024] of data item from row is given by:
[TABLE]
The corresponding gate is illustrated in Fig. 3(d).
4 Composition of Multi-Controlled X Gates
In this section, we analyze the composition and simplification of multi-controlled (MCX) gates that arise in index-mapping oracles.
Consider two successive shift operators (see Eqs. 16 and 3(a)) acting on three matrix qubits. Expanding both operators, we obtain
[TABLE]
Since the two operators act on the same qubits, we can reorder the factors by grouping terms acting on identical control and target qubits. This reordering is valid because the controlled operations corresponding to distinct control states and act on orthogonal subspaces, i.e., which implies that the corresponding MCX gates commute. Therefore, each group forms a commuting composition of MCX gates acting on the same qubits, which we denote as . A similar structure arises in deletion operations across multiple rows (see Fig. 3(d)). In both cases, we obtain compositions of MCX gates acting on identical sets of control and target qubits. Such compositions can be simplified under suitable conditions.
Let denote the index set of qubits (see Eqs. 2 and 1). Let
[TABLE]
be the set of all -bit binary strings, where
[TABLE]
Consider a composition of MCX gates acting on qubits:
[TABLE]
where is the target qubit and the controls act on the remaining qubits. Let the set of control strings be
[TABLE]
The simplification of such compositions is possible under certain conditions as described in the following Theorem 2.
Theorem 2**.**
Let . Suppose there exists a subset of indices with such that
[TABLE]
and the substrings on the remaining indices satisfy
[TABLE]
Then
[TABLE]
where .
Proof: Consider each MCX gate is written as,
[TABLE]
where the identity operator has dimension corresponding to the number of qubits it acts on. Let denote the set of fixed indices where , and let denote the varying indices. We define
[TABLE]
where the substrings enumerate all binary strings.
Without loss of generality, we assume that the qubits are ordered such that the indices in precede those in . This does not affect the final operation and allows us to rewrite the control state as,
[TABLE]
Then the product of MCX gates can be written as,
[TABLE]
In this product expansion, due to orthogonality , the cross-terms get eliminated resulting in the summation as,
[TABLE]
where , since the summation of projectors over complete computational basis forms the identity operator. Therefore, the above expression can be further simplified to,
[TABLE]
In the special case , the control set spans the entire computational basis, and hence
[TABLE]
5 Combinatorial Optimization Based Mapping of Basis States
In the previous section (see Section 4), we showed that a composition of MCX gates can be compressed when the control set exhibits a fixed set of indices , with , such that
[TABLE]
and the substrings over the remaining indices enumerate all binary strings of length .
In this section, we address the general case where the set does not satisfy these structural conditions. We propose to permute amplitudes among computational basis states so as to construct a modified set that satisfies the conditions required for compression (see Theorem 2).
For a given set of fixed indices , with , we define a structured set such that
[TABLE]
and the substrings over the remaining indices exhaust all configurations,
[TABLE]
This construction ensures that and that satisfies the structural conditions required for MCX compression.
The objective is to find such a set together with a bijection that minimizes the total Hamming distance
[TABLE]
We formalize this as a combinatorial optimization problem below.
Theorem 3**.**
Given an arbitrary set of binary strings with and a set of fixed indices with , there exists a set satisfying the above structural constraints and a bijection that minimizes
[TABLE]
Proof: Let . We first construct a candidate set . Define the fixed bit pattern over indices :
[TABLE]
where denotes the substring of restricted to indices , and mode returns the most frequent substring. In the case of ties, multiple valid choices of may exist. Using , we construct the set by fixing
[TABLE]
and assigning the remaining bits such that
[TABLE]
Thus, contains exactly all binary strings consistent with the fixed pattern on , and therefore .
Next, we separate the common elements:
[TABLE]
Since , it follows that .
We define the cost matrix with entries
[TABLE]
where and .
The problem of finding the optimal bijection reduces to the following integer linear program:
[TABLE]
where indicates that .
This is the classical linear assignment problem, which admits an optimal solution and can be solved in time using the Hungarian algorithm Kuhn [1955], Munkres [1957], Burkard et al. [2012], Wolsey [2020]. The existence of an optimal solution establishes the existence of the required bijection .
Choosing the bitwise mode maximizes the overlap , thereby minimizing the size of the reduced sets and , which reduces the computational cost of the assignment problem.
In general, multiple optimal bijections may exist. A canonical solution can be obtained by imposing deterministic tie-breaking rules (e.g., lexicographic ordering) or by perturbing the cost matrix as with arbitrarily small and distinct perturbations . Such perturbations ensure uniqueness of the minimizer while preserving optimality of the original problem Burkard et al. [2012], Schrijver and others [2003], Korte and Vygen [2008].
6 Coherent Permutation Using Multi-Controlled X Gates
In this section, we introduce a coherent permutation of amplitudes among basis states using MCX gates. Here, coherent refers to a unitary (reversible) transformation that preserves superposition and relative phases, without measurement or state collapse.
Consider a -qubit quantum state , where the computational basis is indexed by binary strings over the index set (see Eqs. 2 and 1). Any basis state is represented by a binary string . We begin by defining a primitive operation that swaps amplitudes between two basis states differing in a single qubit.
Definition 6.1**.**
A_SWAP: Let be binary strings such that they differ only at qubit , i.e.,
[TABLE]
Define the common control string where . Then the amplitudes corresponding to and are swapped via
[TABLE]
by applying the gate
[TABLE]
Note that A_SWAP permutes amplitudes without affecting other basis states. We now generalize this operation to permute amplitudes between two subsets of basis states.
Definition 6.2**.**
A_PERMUTE: Let with , and let be a bijection. Define
[TABLE]
For each , define Choose a path (sequence) satisfying
[TABLE]
and the constraint Then define the local walk operator
[TABLE]
where the product is ordered from left to right in increasing . Finally,
[TABLE]
After each A_SWAP, the amplitudes are updated implicitly. The overall operator A_PERMUTE is unitary and consists of a composition of MCX gates (see Section 4), possibly acting on different control and target qubits. As shown in Theorem 2, a composition of MCX gates can be compressed into a single MCX gate when the control states satisfy specific structural constraints. We now consider the inverse construction: expressing a single MCX gate as a composition of MCX gates.
Theorem 4**.**
Let be a control state defined on indices , and . Then there exists a set of control states such that
[TABLE]
and
[TABLE]
The proof follows directly by reversing the construction in Theorem 2. Operationally, this implies that a single MCX gate can be interpreted as simultaneously performing A_SWAP operations across all pairs of basis states whose control patterns belong to . Consider an example of three qubits and a quantum state vector . The controlled-NOT gate
[TABLE]
which swaps the amplitudes as
[TABLE]
Finally, note that if available as a native gate, a SWAP operation can also be used to permute amplitudes. Otherwise, a SWAP gate can be decomposed into three CNOT gates.
7 Optimized Index Mapping Oracle
We have seen that index mapping oracles and Section 3.2 can shift and delete the data elements in the block encoded matrix. We can compress the composition of MCX gates when the control set exhibits a fixed set of indices and satisfies the structural constraints in Section 4. Otherwise, we can determine an arbitrary and find a set such that Section 5. The choice of fixed indices determines the qubits on which the MCX gates are applied. Therefore, we can use this as an advantage in superconducting quantum hardware and apply the MCX gates on nearest-neighbor qubits reducing the control complexity.
In this section, we discuss the optimized operations in index mapping oracle through several examples in block encoding of sparse matrices, covering variety of applications.
7.1 Shift
In the combined operation of shift oracles Section 3.2, the objective is to get nearest-neighbor MCX gates. To achieve this, the choice of comes from choosing the qubits in closer to matrix qubits . The choice of determines the mapping . Then the amplitudes are permuted using (refer Definition 6.2). The visualization of this combined shift operation is shown in Fig. 5(a). After shiting, the amplitudes are rearranged back to retain the original order, avoiding confusion in subsequent operations.
We now present some examples for intuitive understanding. Consider block encoding an matrix with three matrix qubits and two data qubits . Let , basis states for are .
Example 7.1**.**
The data elements are to be shifted left by one column using the operators , corresponding to a many-to-one mapping.
Solution: The control set satisfies the structural constraints in Section 4 such that , fixed index for first composition of MCX gates as in Eqs. 19 and 4. Then the combined shift operation is given by
[TABLE]
where denote no control gate on qubit . The corresponding circuit representation is shown in Fig. 5(b).
Example 7.2**.**
The data elements are to be shifted left by one column using the operators , corresponding to a many-to-one mapping.
Solution: The control states does not satisfy the constraints in Theorem 2. So we choose the fixed index for first composition of MCX gates as in Eqs. 19 and 4. We apply the (see Fig. 5(a)), where the amplitudes are swapped using a CNOT (control value is 0) gate, as shown in Fig. 5(c). After this permutation, the combined shift operator can be applied as:
[TABLE]
Example 7.3**.**
Assume a data vector of 7 values padded with [math]: . The task is to shift the data items left by one column.
Solution: According to Theorem 2, the control set should be a power of two . Since the task involves three data elements, a naive approach would be to apply shift operation individually. Alternatively, one can exploit the [math] in the data vector and perform a combined shift on four data elements , as shifting [math] does not affect the block-encoded matrix Eq. 11. The combined operation for these basis states requires permutation, as illustrated in Fig. 5(a).
Note that shifting a single data item left and then right results in the identity operation, i.e., (refer Figs. 3(a) and 3(b)). This property can be exploited when applying combined operations to simplify MCX gates.
7.2 Delete
We have seen how a single data element can be deleted in a specific row Fig. 3(d). In this section, we generalize this to deletion in multiple rows. Consecutive delete operations of a single data element across multiple rows result in a composition of MCX gates with control and target on the same qubits. When the control states of such a composition satisfy the constraints in Theorem 2, they can be combined into a single MCX gate. Otherwise, one can choose a set of fixed indices and determine (refer Section 5). The choice of comes from choosing the qubits in close to and to achieve nearest-neighbor connectivity. The combined delete operation along with the permutation operator is presented in Fig. 5(d).
Example 7.4**.**
Consider three matrix qubits . The task is that the data element is to be deleted in rows using the operators , corresponding to a one-to-many mapping.
Solution: The set of control states does not have and hence does not satisfy the constraints in Theorem 2. We choose and obtain (see Section 5) using the linear sum assignment algorithm Crouse [2016]. The corresponding permutation of amplitudes is given as
[TABLE]
The combined deletion is given by
[TABLE]
After the deletion, the rows are permuted back to their original order. The circuit for this example is shown in Fig. 5(e).
Furthermore, a many-to-many mapping is also possible, where more than one data element can be deleted in multiple rows, provided that the set of rows to be deleted is common for all data elements.
7.3 Insert
We have seen how to delete a data element in multiple rows. However, if the task is to insert a data item into one or a few rows, performing deletion on all other rows would require exponential MCX gates. Therefore, in this section, we introduce the insert operator, as illustrated in Example 7.5.
Example 7.5**.**
Consider an example of block encoding a matrix of . The data elements are to be inserted in different rows , respectively, corresponding to a one-to-one mapping.
Solution: To insert a single data element in row , we define
[TABLE]
where means deleting in all rows and it is compressed to single MCX gate as presented in Eqs. 20 and 2. Since the proposed block encoding method places each data element in every row Eq. 11, we first delete the data element from all rows and then apply deletion on the desired row . This effectively inverts the deletion on the desired row, resulting in the insertion of the data element in that row alone.
To insert a set of data elements into rows , respectively, we formulate
[TABLE]
The circuit representation for this task is shown in Fig. 5(f), where (represented as ) is included (if needed) for permutation of amplitudes for a chosen .
8 Complete Circuit
In this section, we present the complete circuit for block encoding of sparse matrices, including the optimized index mapping oracle (see Section 7). For clarity, the procedure can be summarized as follows:
Given a sparse matrix, construct the data and sign vector Eqs. 5 and 6. 2. 2.
Obtain the state preparation oracle for the PREP and UNPREP operators LABEL:eq:PREP and LABEL:eq:UNPREP (see Section 3.1). 3. 3.
Tabulate the required shift, delete, and insert operations for each data element. 4. 4.
Identify common operators to apply the optimized index mapping oracle Section 7. 5. 5.
Check for control states in , and generate if necessary Section 5. 6. 6.
Determine the coherent permutation gates for amplitude reordering, (if required) Section 6. 7. 7.
Apply all operations within a single circuit to obtain the scaled matrix block encoded as in Eq. 1, and multiply by the subnormalization factor to recover the original matrix.
An overview of the circuit architecture for block encoding is shown in Fig. 6(a). In this design, the amplitudes are permuted back after every combined operation, ensuring that the order of amplitudes remains consistent throughout the circuit. Let amplitude-permuting operator A_PERMUTE consists of MCX gates. Then,
[TABLE]
where the inverse operation uses the same MCX gates applied in reverse order.
A potential optimization is illustrated in Fig. 6(b). Here, the state preparation oracle initializes the amplitudes in an order already suited to the first combined shift operator. Then the amplitudes are permuted only once before each combined operation, and not reordered back to their original configuration at intermediate steps. This means the order of amplitudes evolve after each A_PERMUTE application, and finally they are restored by applying at the end of the circuit. If is implemented strictly as in Eq. 28, the construction closely resembles Fig. 6(a).
A promising research direction is to explore permutation of amplitudes in arbitrary order to avoid strict reversal using the techniques discussed in Section 6. That said, this requires careful bookkeeping of the evolving amplitude order, and permuting arbitrary orders may become increasingly costly as the number of permutations grows.
9 Applications
In this section, we present two examples of block encoding of sparse matrices: a complex tridiagonal matrix and a structured real matrix.
9.1 Complex Tridiagonal Matrix
Consider a complex tridiagonal matrix of the form,
[TABLE]
where and . The corresponding data vector is
[TABLE]
with sign vector
[TABLE]
State preparation requires data qubits (refer Theorem 1), where the state is padded with zeros. For block encoding, the data elements in basis states must be shifted left by one column and deleted in row as in Eq. 16. If required, amplitudes can be permuted for nearest-neighbor MCX gate connectivity. Similarly, the data elements in basis states must be shifted right by one column and deleted in row as in Eq. 17.
The circuit representation of this construction is shown in Fig. 7 and provides a practical gate-level realization that can be directly employed within quantum algorithms. Note that zeros in the state vector can also be leveraged for combined shifting (see Example 7.3), thereby reducing the control overhead of the MCX gates.
9.2 Structured Real Matrix
Consider a sparse real matrix with the structure shown in Fig. 8(a). Following the block-encoding procedure outlined in Section 8, the corresponding data vector is and sign vector is . The number of data qubits required for block encoding is , where the state vector is padded with zeros for state preparation.
The block-encoding operations (shift, delete, insert) are summarized in Table 1 (refer Section 7). Note that two distinct values , occur on the main diagonal. Within the block-encoding framework, these appear as the combined diagonal value (see Eq. 11). To address this case, we outline three possible encoding strategies with an objective to reduce MCX gates and subnormalization factor:
Without modification: ,
Delete operations: , Contribution to : . 2. 2.
With modification: ,
Delete operations: ,
Contribution to : ,
This is advantageous when . 3. 3.
With modification: ,
Delete operations: ,
Contribution to : ,
This is advantageous when .
For demonstration, we choose the second approach as illustrated in Table 1. Next, we determine the common shift operators (refer Section 7), shown in Table 2.
We demonstrate the optimized index mapping oracle for the block encoding operations in Table 1. For demonstration purposes, we consider the shift in Table 2. For , the control states does not satisfy the structural requirements in Theorem 2. Therefore, we determine and apply the permutation operator as shown in Fig. 5(a). Following Theorem 3, the mapping is obtained using the linear sum assignment algorithm Crouse [2016]. Finally, the gates are generated to permute according to Definition 6.2, resulting in the following walk operators:
[TABLE]
The MCX gates implementing the mapping in Eq. 30 (see Definition 6.2) correspond to the operator , as illustrated in Fig. 8(b). Note that, alternatively, one may employ multi-controlled SWAP gates for specific cases such as , or consider multi-swapping strategies as discussed in Theorem 4.
Considering the combined delete (refer Table 1), the control states require permutation. Therefore, we determine and implement , as shown in Fig. 5(d). Following the same protocol as for the previous case, we obtain the mapping , leading to the following walk operators:
[TABLE]
The MCX gates corresponding to the mapping in Eq. 31 (see Definition 6.2) are implemented through , as illustrated in Fig. 8(c). The complete circuit for block encoding the matrix Fig. 8(a) is obtained by combining the shift, delete, insert, and permutation operators, as shown in Fig. 6. This circuit provides a practical gate-level realization that can be directly employed within quantum algorithms.
10 Discussion
In this work, we developed a systematic framework for the block encoding of sparse matrices with explicit gate-level constructions and accompanying compression strategies. Our approach provides a concrete pathway from abstract oracle-based formulations to hardware-realizable quantum circuits. In particular, we presented an intuitive interpretation of the PREP/UNPREP-based block encoding framework and extended it to accommodate complex-valued matrices.
A central observation in our analysis is that the subnormalization factor (refer Eqs. 5 and 1) arising in standard constructions typically exceeds the spectral norm . Whether one can construct block encodings with subnormalization factor matching remains an important open question, with direct implications for the efficiency of QSVT-based algorithms. Our framework offers a complementary perspective that may facilitate more systematic estimation of quantum resource requirements Clader et al. [2023], Chakraborty et al. [2018].
At the circuit level, we showed that block encoding naturally gives rise to structured compositions of MCX gates, and that these compositions can be compressed into single MCX operations under suitable conditions (Section 4). This directly reduces circuit depth and control overhead. We further established a connection between amplitude reordering and combinatorial optimization, formulating the assignment of MCX control qubits as an optimization problem constrained by hardware connectivity. This enables circuit constructions that minimize permutation overhead while satisfying nearest-neighbor constraints, thereby linking quantum circuit synthesis with classical optimization techniques.
Our coherent permutation operators provide an additional advantage: they implement amplitude reordering through fully unitary operations, preserving superposition and entanglement throughout the computation. By expressing permutations as structured compositions of MCX gates (Theorem 4), our framework enables systematic decomposition into two-qubit primitives compatible with current hardware. This suggests a broader perspective in which permutation design itself becomes a resource for circuit optimization.
We introduced an optimized index mapping oracle that yields nearest-neighbor MCX interactions, making the construction well-suited for superconducting qubit architectures. By integrating all components, we obtained a complete circuit-level realization of block encoding for sparse matrices (Fig. 6) and highlighted how future architectures may further benefit from low-overhead permutation layers.
Finally, we validated our framework on two representative examples: a complex tridiagonal matrix and a structured real matrix. These case studies demonstrate that the full pipeline—from theoretical construction to executable circuits—can be implemented in a consistent and scalable manner. The resulting circuits are directly applicable to key quantum algorithms such as QSVT, HHL, and Hamiltonian simulation Martyn et al. [2021], thereby advancing the practical deployment of block encoding in near-term and fault-tolerant quantum computing.
Acknowledgements
This research was funded through the European Union’s Horizon Programme (HORIZONCL4-2021-DIGITALEMERGING-02-10, Grant Agreement 101080085 (QCFD).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1M. Amy, D. Maslov, M. Mosca, and M. Roetteler (2013) A meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 32 ( 6 ), pp. 818–830 . Cited by: §1 .
- 2R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc Clean, A. Paler, A. Fowler, and H. Neven (2018) Encoding electronic spectra in quantum circuits with linear t complexity . Physical Review X 8 ( 4 ), pp. 041015 . Cited by: §1 .
- 3R. Beals, S. Brierley, O. Gray, A. W. Harrow, S. Kutin, N. Linden, D. Shepherd, and M. Stather (2013) Efficient distributed quantum computing . Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 469 ( 2153 ), pp. 20120686 . Cited by: §1 .
- 4D. W. Berry, A. M. Childs, R. Cleve, R. Kothari, and R. D. Somma (2015) Simulating hamiltonian dynamics with a truncated taylor series . Physical review letters 114 ( 9 ), pp. 090502 . Cited by: §1 .
- 5F. G. Brandao and K. M. Svore (2017) Quantum speed-ups for solving semidefinite programs . In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS) , pp. 415–426 . Cited by: §1 .
- 6R. Burkard, M. Dell’Amico, and S. Martello (2012) Assignment problems: revised reprint . SIAM . Cited by: §5 , §5 .
- 7D. Camps, L. Lin, R. Van Beeumen, and C. Yang (2024) Explicit quantum circuits for block encodings of certain sparse matrices . SIAM Journal on Matrix Analysis and Applications 45 ( 1 ), pp. 801–827 . Cited by: §1 , §3.2 .
- 8D. Camps and R. Van Beeumen (2022) Fable: fast approximate quantum circuits for block-encodings . In 2022 IEEE International Conference on Quantum Computing and Engineering (QCE) , pp. 104–113 . Cited by: §1 .
