Sub-Pixel Registration of Wavelet-Encoded Images

Vildan Atalay Aydin; Hassan Foroosh

arXiv:1705.00430·cs.CV·May 2, 2017

Sub-Pixel Registration of Wavelet-Encoded Images

Vildan Atalay Aydin, Hassan Foroosh

PDF

Open Access

TL;DR

This paper introduces a novel wavelet domain sub-pixel registration method that accurately aligns images directly from sparse wavelet coefficients, enhancing super-resolution and imaging applications.

Contribution

The paper presents a new approach for direct wavelet domain registration that decouples parameters and works effectively with sparse coefficients, improving accuracy and efficiency.

Findings

01

Outperforms state-of-the-art methods on simulated data

02

Maintains high accuracy with sparse wavelet coefficients

03

Effective for real-world imaging applications

Abstract

Sub-pixel registration is a crucial step for applications such as super-resolution in remote sensing, motion compensation in magnetic resonance imaging, and non-destructive testing in manufacturing, to name a few. Recently, these technologies have been trending towards wavelet encoded imaging and sparse/compressive sensing. The former plays a crucial role in reducing imaging artifacts, while the latter significantly increases the acquisition speed. In view of these new emerging needs for applications of wavelet encoded imaging, we propose a sub-pixel registration method that can achieve direct wavelet domain registration from a sparse set of coefficients. We make the following contributions: (i) We devise a method of decoupling scale, rotation, and translation parameters in the Haar wavelet domain, (ii) We derive explicit mathematical expressions that define in-band sub-pixel…

Figures10

Click any figure to enlarge with its caption.

Tables7

Table 1. TABLE I: Notation

$I (x, y)$	Reference image
$J (x, y, σ, θ, t_{x}, t_{y})$	Sensed image to be registered to $I$
$σ$ , $θ$ , $t_{x}$ , $t_{y}$	Transformation parameters to be estimated: scale, rotation angle, and shifts along the two axes, respectively
$A, a, b, c$	Wavelet transform approximation, horizontal, vertical, and diagonal detail coefficients, respectively
$h$	Number of hypothetically added levels
$s_{x}$ ( $s_{y}$ )	Perceived horizontal (vertical) integer shift of wavelet coefficients after the hypothetically added levels ( $h$ )

Table 2. TABLE II: Comparison of the proposed method with other baseline methods in estimated shifts, PSNR, and MSE.

Image	Exact shift	Keren [122]			Guizar [123]			Szeliski [89]			Proposed
		Est.	PSNR	MSE	Est.	PSNR	MSE	Est.	PSNR	MSE	Est.	PSNR	MSE
a	0.5 0.5	0.4878 0.5427	56.91	0.11	0.56 0.53	50.05	0.56	0.5017 0.5009	80.91	0	0.5 0.5	Inf	0
a	0.25 -0.125	0.2456 -0.1212	72.23	0.003	0.29 -0.16	53.10	0.28	0.2518 -0.1243	80.67	0	0.25 -0.125	Inf	0
	-0.375 -0.4	-0.3826 -0.4146	63.99	0.02	-0.42 -0.42	52.67	0.31	-0.3732 -0.3990	80.36	0	-0.375 -0.4023	82.59	0
	-0.625 0.75	-0.6958 -0.8268	47.81	0.95	-0.70 0.81	48.00	0.90	-0.6231 0.7508	80.17	0	-0.625 0.75	Inf	0
b	0.33 -0.33	0.3347 -0.3008	54.19	0.24	0.27 -0.33	45.91	1.61	0.3275 -0.3316	72.49	0.003	0.3281 -0.3438	60.74	0.05
b	0.167 0.5	0.1641 0.6154	42.08	3.91	0.11 0.55	44.78	2.10	0.1633 0.4977	69.04	0.007	0.1719 0.5	67.76	0.01
	-0.875 -0.33	-0.8639 -0.2986	52.58	0.35	-0.92 -0.33	48.44	0.91	-0.8783 -0.3316	70.51	0.005	-0.875 -0.3438	60.40	0.06
	-0.125 0.67	-0.1309 0.8230	39.53	7.01	-0.08 0.75	42.94	3.20	-0.1277 0.6695	72.62	0.003	-0.125 0.6719	77.60	0.001

Table 3. TABLE III: Comparison of average PSNR and MSE for rotation recovery for 121 simulations.

Image	Vandewalle [121]			Proposed
	PSNR	MSE	Time (s)	PSNR	MSE	Time (s)
a	32.83	6.59	0.16	42.94	1.75	0.49
b	37.53	4.09	0.16	43.53	1.81	0.49

Table 4. TABLE IV: Comparison of PSNR, MSE, and time for rotation and translation recovery.

Image	Exact $(x, y, θ)$	Vandewalle [121]				Proposed
		Estimate	PSNR	MSE	Time (s)	Estimate	PSNR	MSE	Time (s)
a	$(0.5, - 0.25, 20)$	$(0.8, - 0.5, 19.8)$	23.2	16.4	0.1	$(0.5, - 0.25, 20)$	25.2	13.07	5.55
b	$(- 0.375, - 0.375, - 10)$	$(- 0.337, - 0.62, - 10.2)$	21.17	19.97	0.09	$(- 0.406, - 0.375, - 10)$	22.05	19.7	23.4
c	$(- 0.4375, 0.875, - 30)$	$(- 0.64, 0.526, - 30)$	21.01	19.02	0.09	$(- 0.39, 0.875, - 30.3)$	20.2	20.86	66

Table 5. TABLE V: Our results for scale, rotation and translation.

Img	Exact $(x, y, θ, σ)$	Results
		Estimate	Time (s)
a	$(0.5, 0.25, - 50, 2)$	$(0.5, 0.25, - 50.1, 2)$	25.7
a	$(0.5, 0.25, 50, 2)$	$(0.5, 0.25, 49.8, 2)$	4.94
b	$(- 0.25, 0.25, 10, 1 / 4)$	$(- 0.28, 0.28, 10.2, 1 / 4)$	105.3
c	$(- 0.5, - 0.375, 30, 1 / 2)$	$(- 0.5, - 0.375, 30.1, 1 / 2)$	93.6

Table 6. TABLE VI: Comparison of our method with other methods for real world examples from [ 119 ] and [ 120 ] in PSNR and MSE.

Dataset	Reference img.	Sensed img.	Vandewalle [121]		Evangelidis [124]		Proposed
Dataset	Reference img.	Sensed img.	PSNR	MSE	PSNR	MSE	PSNR	MSE
Artichoke	1	2	26.88	11.59	31.5	6.78	31.8	6.72
Artichoke	27	28	26.86	11.17	42.08	1.93	31.06	6.92
CIL	HorizR0	HorizR1	24.02	13.2	12.4	50.4	24.73	12.66
CIL	VertR4	VertR5	20.66	21.57	22.2	18.46	20.75	22.5
MDSP Bookcase 1	2	3	26.58	11.90	12.52	60.31	25.10	14.11

Table 7. TABLE VII: Comparison of results for noisy environments with ”Pentagon” image for ( 0.25 , 0.75 ) 0.25 0.75 (0.25,0.75) shift.

SNR	Foroosh [59]	Chen [126]	Proposed
10 dB	0.38 0.65	0.29 0.68	0.25 0.75
20 dB	0.31 0.71	0.28 0.74	0.25 0.75
30 dB	0.30 0.73	0.27 0.74	0.25 0.75
40 dB	0.29 0.74	0.27 0.74	0.25 0.75

Equations32

X_{i, j}^{l}

X_{i, j}^{l}

Y_{i, j}^{l}

Z_{i, j}^{l}

W_{i, j}^{l}

D_{i, j}^{l} = ⎩ ⎨ ⎧ D_{i /2, j /2}^{l - 1} + X_{i /2, j /2}^{l}, D_{i /2, ⌊ j /2 ⌋}^{l - 1} + Y_{i /2, ⌊ j /2 ⌋}^{l}, D_{⌊ i /2 ⌋, j /2}^{l - 1} + Z_{⌊ i /2 ⌋, j /2}^{l}, D_{⌊ i /2 ⌋, ⌊ j /2 ⌋}^{l - 1} + W_{⌊ i /2 ⌋, ⌊ j /2 ⌋}^{l}, 0, i is even, j is even i is even, j is odd i is odd, j is even i is odd, j is odd i = j = l = 0

D_{i, j}^{l} = ⎩ ⎨ ⎧ D_{i /2, j /2}^{l - 1} + X_{i /2, j /2}^{l}, D_{i /2, ⌊ j /2 ⌋}^{l - 1} + Y_{i /2, ⌊ j /2 ⌋}^{l}, D_{⌊ i /2 ⌋, j /2}^{l - 1} + Z_{⌊ i /2 ⌋, j /2}^{l}, D_{⌊ i /2 ⌋, ⌊ j /2 ⌋}^{l - 1} + W_{⌊ i /2 ⌋, ⌊ j /2 ⌋}^{l}, 0, i is even, j is even i is even, j is odd i is odd, j is even i is odd, j is odd i = j = l = 0

a_{i, j_{n e w}}^{N^{'} - k}

a_{i, j_{n e w}}^{N^{'} - k}

j_{1}

j_{1}

j_{2}

j_{3}

=

=

q_{x} q_{y} 1

q_{x} q_{y} 1

a_{J} = σ cos (θ) a_{I} - σ sin (θ) b_{I}

a_{J} = σ cos (θ) a_{I} - σ sin (θ) b_{I}

b_{J} = σ sin (θ) a_{I} + σ cos (θ) b_{I}

\hat{θ} = θ arg max (h_{I} ⋆ h_{J} (θ))

\hat{θ} = θ arg max (h_{I} ⋆ h_{J} (θ))

h_{img} = i = 1 \sum k arctan (\frac{b _{img} ( i )}{a _{img} ( i )})

h_{img} = i = 1 \sum k arctan (\frac{b _{img} ( i )}{a _{img} ( i )})

\hat{θ}^{*} = \hat{θ} arg min ∣∣ a_{J} - R a_{I} ∣ ∣_{2} + ∣∣ b_{J} - R b_{I} ∣ ∣_{2}

\hat{θ}^{*} = \hat{θ} arg min ∣∣ a_{J} - R a_{I} ∣ ∣_{2} + ∣∣ b_{J} - R b_{I} ∣ ∣_{2}

\overset{σ}{^} = \frac{1}{2} \frac{\frac{1}{M _{I}} \sum _{i = 1}^{M_{I}} R ( a _{I} ( i ))}{\frac{1}{M _{J}} \sum _{i = 1}^{M_{J}} R ( a _{J} ( i ))} + \frac{\frac{1}{M _{I}} \sum _{i = 1}^{M_{I}} R ( b _{I} ( i ))}{\frac{1}{M _{J}} \sum _{i = 1}^{M_{J}} R ( b _{J} ( i ))}

\overset{σ}{^} = \frac{1}{2} \frac{\frac{1}{M _{I}} \sum _{i = 1}^{M_{I}} R ( a _{I} ( i ))}{\frac{1}{M _{J}} \sum _{i = 1}^{M_{J}} R ( a _{J} ( i ))} + \frac{\frac{1}{M _{I}} \sum _{i = 1}^{M_{I}} R ( b _{I} ( i ))}{\frac{1}{M _{J}} \sum _{i = 1}^{M_{J}} R ( b _{J} ( i ))}

{\hat{t}_{x}, \hat{t}_{y}} = t_{x}, t_{y} arg max

{\hat{t}_{x}, \hat{t}_{y}} = t_{x}, t_{y} arg max

\frac{\sum _{x, y} ( a _{I} ( x + t _{x} , y + t _{y} ) a _{J} ( x ^{'} , y ^{'} ))}{\sum _{x, y} ( a _{I} ( x + t _{x} , y + t _{y} ) ) ^{2} \sum _{x, y} ( a _{J} ( x ^{'} , y ^{'} ) ) ^{2}} +

\frac{\sum _{x, y} ( b _{I} ( x + t _{x} , y + t _{y} ) b _{J} ( x ^{'} , y ^{'} ))}{\sum _{x, y} ( b _{I} ( x + t _{x} , y + t _{y} ) ) ^{2} \sum _{x, y} ( b _{J} ( x ^{'} , y ^{'} ) ) ^{2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced MRI Techniques and Applications · Sparse and Compressive Sensing Techniques · Photoacoustic and Ultrasonic Imaging

Full text

Sub-Pixel Registration of Wavelet-Encoded Images

Vildan Atalay Aydin and Hassan Foroosh Vildan Atalay Aydin and Hassan Foroosh are with the Department of Computer Science, University of Central Florida, Orlando, FL, 32816 USA (e-mails: [email protected] and [email protected]).

Abstract

Sub-pixel registration is a crucial step for applications such as super-resolution in remote sensing, motion compensation in magnetic resonance imaging, and non-destructive testing in manufacturing, to name a few. Recently, these technologies have been trending towards wavelet encoded imaging and sparse/compressive sensing. The former plays a crucial role in reducing imaging artifacts, while the latter significantly increases the acquisition speed. In view of these new emerging needs for applications of wavelet encoded imaging, we propose a sub-pixel registration method that can achieve direct wavelet domain registration from a sparse set of coefficients. We make the following contributions: (i) We devise a method of decoupling scale, rotation, and translation parameters in the Haar wavelet domain, (ii) We derive explicit mathematical expressions that define in-band sub-pixel registration in terms of wavelet coefficients, (iii) Using the derived expressions, we propose an approach to achieve in-band sub-pixel registration, avoiding back and forth transformations. (iv) Our solution remains highly accurate even when a sparse set of coefficients are used, which is due to localization of signals in a sparse set of wavelet coefficients. We demonstrate the accuracy of our method, and show that it outperforms the state-of-the-art on simulated and real data, even when the data is sparse.

Index Terms:

Subpixel Registration Wavelet Decomposition Haar Wavelets Image Pyramids

I Introduction

Image registration plays a crucial role in many areas of image and video processing, such as super-resolution [1, 2, 3, 4, 5, 6, 7, 8, 9], self-localization [10, 11, 12], image annotation [13, 14, 15, 16, 17, 18], surveillance [19, 20, 21], action recognition [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], target tracking [34, 35, 36, 37], shape description and object recognition [38, 39, 40, 40], image-based rendering [41, 42, 43, 44], and camera motion estimation [45, 12, 46, 47, 48, 49, 50, 51], to name a few.

There are various different ways that one could categorize image registration methods. In terms of functioning space, they could be either spatial domain [52, 53, 54, 55] or transform domain methods [56, 57, 58, 59, 60, 61, 62, 63, 64]. On the other hand, in terms of their dependency on feature/point correspondences they may be categorized as either dependent [65, 66, 67, 68, 69, 70, 71] or independent [52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64] of feature/point correspondences. Finally, in terms of the complexity of the image transformation, they may be categorized as linear parametric (e.g. euclidean, affine, or projective) [72, 73, 74, 75, 76], or semi-parametric/non-parametric diffeomorphic [71, 77, 78, 79, 80]. The method that we propose in this paper is a parametric method that can handle full projective transformations in the Haar wavelet domain without establishing any feature or point correspondences as a preprocessing step.

Recently, there has been a trend in various imaging modalities and applications such as non-destructive testing and Magnetic Resonance Imaging (MRI) to adopt wavelet-encoded imaging [81, 82] and sparse sensing [83, 84, 85], with the aim of achieving better resolution, reduced distortions, higher SNR, and quick acquisition time, which are crucial for these applications. Sub-pixel registration is an integral step of various applications involving these wavelet-encoded compressive imaging technologies. Therefore, in this paper, our goal is to obtain a wavelet domain sub-pixel registration method that can achieve highly accurate results from a sparse set of wavelet coefficients. We make the following major contributions towards this goal: (i) We devise a method of decoupling scale, rotation, and translation parameters in the Haar wavelet domain, (ii) We derive explicit mathematical expressions that define in-band sub-pixel registration in terms of Haar wavelet coefficients, (iii) Using the derived expressions, we propose a multiscale approach to achieve in-band sub-pixel registration, avoiding back and forth transformations. (iv) Our solution remains highly accurate even when a sparse set of coefficients are used, due to signal energy localization in a sparse set of wavelet coefficients. Extensive experiments are used to validate our method both on simulated and real data under various scenarios.

II Related Work

The earliest methods related to our work are based on image pyramids, with the aim of reducing computational time and avoiding local extrema. Examples include the work by Thévenaz et al. [86], who minimized the mean square intensity differences using a modified version of the Levenberg-Marquardt algorithm, and the work by Chen et al. [87], who maximized the mutual information with a pyramid approach. Later, these approaches were extended to deal with local deformations in a coarse-to-fine fashion by either estimating a set of local parameters [88] or fitting a local model such as multi-resolution splines [89]. Cole-Rhodes et al. [90] proposed a method based on maximizing mutual information using stochastic gradient. Other examples of coarse-to-fine schemes are by Gong et al. [91], where automatic image registration is performed by using SIFT and mutual information, and by Ibn-Elhaj [92], where the bispectrum is used to register noisy images.

Hu and Acton [93] obtain sub-pixel accuracy by using morphological pyramid structure with Levenberg-Marguardt optimization and bilinear interpolation. Kim et al. [94] apply Canny edge operator in a hierarchical fashion. Local regions of interest of images are registered in a coarse-to-fine fashion by estimating deformation parameters by Zhou et al. [88]; and Szeliski and Coughlan [89] present local motion field using multi-resolution splines.

Template matching was also introduced in image registration for reducing computational cost. Ding et al. [95] utilized template matching with cross correlation in a spatial-domain based solution, while Rosenfeld and Vanderbrug [96, 97] used block averaging in template matching. Hirooka et al. [98] optimize a small number of template points in each level of hierarchy which is selected by evaluating the correlation of images. In [99], Yoshimura and Kanade apply the Karhunen-Loeve expansion to a set of rotated templates to obtain eigen-images, which are used to approximate templates in the set. Tanimoto, in [100], applies hierarchical template matching to reduce computation time and sensitivity to noise. Anisimov and Gorsky [101] work with templates which have unknown orientation, location, and nonrectangular form.

Examples of wavelet-based methods can be summarized as follows. Turcajova and Kautsky [102] used separable fast discrete wavelet transform with normalized local cross correlation matching based on least square fit, where spline biorthogonal and Haar wavelets outperform other types of wavelets. In [103], Kekre et al. use several types of transforms such as discrete cosine, discrete wavelet, Haar and Walsh transforms for color image registration employing minimization of mean square error. Wang et al. [104] improve the polynomial subdivision algorithm for wavelet-based sub-pixel image registration. Le Moigne et al. [105, 106, 107, 108, 109, 110] have made extensive studies of various aspects of wavelet domain image registration, utilizing in particular the maxima of Daubechies wavelets for correlation based registration and multi-level optimization. In [111], Patil and Singhai use fast discrete curvelet transform with quincunx sampling for sub-pixel accuracy. Tomiya and Ageishi [112] minimize the mean square error; whereas, Wu and Chung [113] utilize mutual information and sum of differences with wavelet pyramids, while Wu et al. [114] proposed a wavelet-based model of motion as a linear combination of hierarchical basis functions for image registration. Hong and Zhang [115], combine feature-based and area-based registration, using wavelet-based features and relaxation based matching techniques, while Alam et al. [116] utilize approximate coefficients of curvelets with a conditional entropy-based objective function.

These methods require transformations between spatial and transform domains, since they start at uncompressed spatial domain and use the wavelets’ multiscale nature to approximate and propagate the solution from coarser to finer levels until it is refined to a good accuracy. Our method reaches the high accuracy at a coarser level with a sparse set of coefficients and no domain transformations.

III Sub-pixel Shifts in the Haar Domain

We first derive mathematical expressions that define in-band (i.e. direct wavelet-domain) shifts of an image, which will be used later for general registration under a similarity transformation (i.e. scale, rotation, and translation) [41].

III-A Notation

Table I summarizes the notations used throughout the paper, to streamline the understanding of the proposed method.

Superscripts of $A,a,b,c$ show the level of wavelet decomposition. Subscripts $x$ and $y$ show horizontal and vertical directions, respectively; and $new$ stands for the calculated shifted coefficients.

III-B In-band Shifts

We demonstrate the derived explicit mathematical expressions for an in-band translation of a given image.

Let $I(x,y)$ be a $2^{N}\times 2^{N}$ image, where $N$ is a positive integer. The Haar transform of this image consists of $N$ levels, where level $l$ holds approximation coefficient $A^{l}_{i,j}$ and horizontal, vertical and diagonal detail coefficients $a^{l}_{i,j}$ , $b^{l}_{i,j}$ , and $c^{l}_{i,j}$ , respectively, with $l=0,...,N-1$ , $i=0,...,2^{l}-1$ and $j=0,...,2^{l}-1$ .

Let,

[TABLE]

Also, let $D^{l}_{i,j}$ be the difference between $A^{0}_{0,0}$ and $A^{l}_{i,j}$ , then, $A^{l}_{i,j}=A^{0}_{0,0}+D^{l}_{i,j}$ .

The following formula shows the relationship between $D^{l}_{i,j}$ and its parent level $l-1$ ;

[TABLE]

Equation (2) shows that $D^{l}_{i,j}$ , for all $l$ , can be calculated only by using the detail coefficients of Haar transform iteratively, since $D^{0}_{0,0}=0$ . We utilize $D^{l}_{i,j}$ to calculate the detail coefficients of the shifted image which implies that the shifting process is in-band.

We can categorize a translational shift for a 2D image into two groups for horizontal and vertical shifts where a diagonal shift can be modeled as a horizontal shift followed by a vertical one. Unlike the common approach of modeling sub-pixel shifts by integer shifts of some upsampled version of the given image, our method models sub-pixel shifts directly in terms of the original level coefficients.

Observation 3.1. Let Haar transform of the image $I(x,y)$ have $N$ levels, with $I(x,y)$ at the $N$ th level. Upsampling an image is equivalent to adding levels to the bottom of the Haar transform, and setting the detail coefficients to zero while keeping the approximation coefficients equal to the ones already in the $N$ th level, $D^{N+h_{0}}_{i,j}=D^{N}_{\lfloor{i/2^{h_{0}}}\rfloor,\lfloor{j/2^{h_{0}}}\rfloor}$ , where $0\leq{h_{0}}\leq{h}$ .

Observation 3.2. Shifting upsampled image by an amount of $s$ is equivalent to shifting the original image by an amount of $s/2^{h}$ , where $h$ is the upsampling factor.

These observations allow us to shift a reference image for a sub-pixel amount without actually upsampling it, which saves memory, reduces computation, and avoids propagating interpolation errors.

Now, let $N^{\prime}=N+h$ and $k=1+h,...,N+h$ . The horizontal detail coefficients of the shifted image in case of a horizontal translation are computed from the reference image coefficients by:

[TABLE]

where,

[TABLE]

Here, $s_{x}$ is the horizontal shift amount at the $(N+h)$ th level (where $s$ and $h$ are calculated based on Observation $3.2$ ), $k$ is the reduction level, $t$ is highest power of 2 by which the shift is divisible. For the subpixel shifts, $t=0$ , since the shift amount at the hypothetically added level is always an odd integer. $t$ is essential to generalize the equation for even shifts. When $k=1$ , we set the coefficients utilizing $j_{2}$ in Eq. (3) to [math], since $j_{2}$ has a non-integer value. $a^{N^{\prime}-k}_{{i,j}_{new}}$ for vertical shifts are obtained by interchanging the $a$ ’s with $b$ ’s, $i$ ’s with $j$ ’s and $m$ ’s with $n$ ’s in Eq. (3).

By examining Eq. (3), it can be seen that each level of horizontal detail coefficients of the shifted image can be calculated using the original levels of the reference image, since $D^{N}_{i,j}$ is calculated in Eq. (2) using only the detail coefficients in its parent levels.

Here, we only demonstrate the formulae for horizontal detail coefficients. Approximation, vertical and diagonal detail coefficients of the shifted image can be described in a similar manner.

IV Sub-pixel Registration

We first demonstrate that scale, rotation, and translation can be decoupled in the wavelet domain. This is similar to decoupling of rotation and translation in Fourier domain in magnitude and phase. We then describe the proposed method to solve the decoupled registration problem for the separated parameters.

Let us assume that sensed image is translated, rotated, and scaled with respect to a reference image, in that given order. Let also $p\in I$ and $q\in J$ be two points, where $I$ and $J$ are the reference and the sensed images, respectively. The point $q$ can be defined in terms of the similarity transformation (scale, rotation, translation) and the point $p$ in terms of homogeneous coordinates as follows:

[TABLE]

where we assume the same scale $\sigma$ for both axes. Here, S, R, and T denote the scale, rotation, and translation matrices, and $\sigma$ , $\theta$ , $t_{x}$ and $t_{y}$ denote the scale factor, the rotation angle in degrees, and the translations along the two axes, respectively. Although we assume the order of transformations as $\textbf{S},\textbf{R},\textbf{T}$ , we first explain rotation recovery to demonstrate the decoupling in the wavelet domain. Algorithm 1 shows the steps of the proposed in-band registration algorithm.

IV-A Rotation Recovery

Let a and b denote wavelet coefficients of the input images as in Section III, where subscripts $I$ and $J$ stand for the images. Wavelet transform of Eq. (4), can be defined as follows:

[TABLE]

Eq. (5) shows the relationship between the Haar wavelet coefficients of two images under similarity transformation, and indicate that the rotation and scale can be separated from translation, since translation parameters do not appear in these equations. In order to recover the rotation and scale independently, we also need to decouple $\sigma$ and $\theta$ . One can see from Eq. (5) that dividing $\textbf{b}_{J}$ by $\textbf{a}_{J}$ eliminates the scale term, the result of which is an approximation to the slopes of local image gradients using Haar coefficients, since Haar coefficients can be viewed as an estimate of partial derivatives. To obtain an initial estimate of the rotation angle, we use wavelet thresholding [117] before finding the local slopes. This will both reduce noise and sparsify the coefficients. We then find an initial estimate of the rotation angle $\theta$ by maximizing the following cross-correlation:

[TABLE]

where $\star$ denotes the cross-correlation, and $h_{I}$ and $h_{J}$ are the histogram of wavelet-coefficient slopes (HWS) for the thresholded coefficients, which we define as follows:

[TABLE]

where $k$ is the number of bins , and the subscript ${\tt img}\in\{I,J\}$ . We then refine the initial estimate $\hat{\theta}$ , in the range $\hat{\theta}\pm 5^{\circ}$ to get the best estimate $\hat{\theta}^{*}$ :

[TABLE]

IV-B Scale Recovery

Since we already demonstrated that scale, rotation and translation can be decoupled in wavelet domain, we can perform scale estimation independently of rotation and translation. Let us assume that the two images have a scale ratio of $\sigma$ . Then, the mean curvature radius calculated on thresholded wavelet coefficients would provide an accurate estimate of the scale factor:

[TABLE]

where ${\cal R}$ shows the radius of curvature.

IV-C Translation Recovery

Once the scale and rotation parameters are recovered and compensated for, the translations $t_{x}$ and $t_{y}$ along the two axes can be recovered independently by maximizing the following normalized cross-correlation function:

[TABLE]

where $\mathbf{a}_{I}(x+t_{x},y+t_{y})$ and $\mathbf{b}_{I}(x+t_{x},y+t_{y})$ are the shifted versions of the reference detail coefficients (corresponding to $\mathbf{a}_{new}$ or $\mathbf{b}_{new}$ in the derivations of Section III-B), calculated using Eq. (3) (or the equivalent for the vertical coefficients); and $\mathbf{a}_{J}(x^{\prime},y^{\prime})$ and $\mathbf{b}_{J}(x^{\prime},y^{\prime})$ are the sensed image detail coefficients after rotation and scale compensation.

Observation 3.2 implies that sub-pixel registration for wavelet-encoded images can be performed directly in the wavelet domain without requiring inverse transformation. Furthermore, if the encoded image is also compressed (e.g. only a sparse set of detail coefficients are available), one can still perform the registration. The latter could be for instance a case of compressed sensing imager based on Haar wavelet sampling basis. To maximize the cost function in Eq. (IV-C), we use a branch and bound (BnB) algorithm, where split of rectangle areas in BnB are decided based on the two maximum cross correlations of four bounds [118].

Algorithm 2 demonstrates the main steps of the proposed method for translation recovery. Shifted horizontal/vertical detail coefficients for the updated bounds are calculated for a specified level ( $k$ ) using Eq. (3) (similar equation for the vertical coefficients), followed by application of maximization of Eq. (IV-C).

When the algorithm converges within an $\epsilon$ distance to the true solution, it often starts osculating. So, as a modification to a general branch and bound method, we take the mid-point of osculations as the solution, which often happens to be the true solution.

The method requires the knowledge of $A^{0}_{0,0}$ for in-band shifts which may limit the approach to image sizes of $2^{N}\times 2^{N}$ . However, the solution can be generalized to images with arbitrary sizes by simply applying the method to a subregion of size $2^{N}\times 2^{N}$ of the original images.

V Experimental Results

To demonstrate the accuracy of our algorithm, we performed extensive experiments on both simulated and real data. In order to simulate reference and sensed images, a given high resolution image is shifted (using bicubic interpolation) and rotated, then both images are downsampled, which is a common technique employed in state-of-the-art literature [59], [121]. If different scale are assumed, then the sensed image is also scaled further. We performed thorough comparisons with state-of-the-art methods, which were given the same input images, and results were evaluated by measuring alignment errors. Fig. 1 shows some of the standard test images together with the real data obtained from [119] and [120]. Captions for real data indicate the dataset and specific image names utilized as reference image.

V-A Validation on Simulated Data

Here, we first performed experiments on translation, rotation and scale recovery separately. We then carried out tests for combination of transformations.

Table II summarizes some of the results for our translational method with simulated data, where the results are compared with the ground truth (GT) and other baseline methods; i.e. [122], [123], [89], in terms of estimated shifts, peak signal-to-noise ratio (PSNR), and mean square error (MSE). Since the expressions derived in Section III-B are exact for any arbitrary shift that can be expressed as positive or negative integer powers of 2, in the noise-free case, exact or near-exact solutions can be achieved, which outperforms the state-of-the-art methods. For any other shift amount, we can get arbitrarily close within the closest integer power of 2, which when compared with the state-of-the-art, is still outstanding.

Table III shows the PSNR, MSE and computational time for our rotation method compared to [121], averaged over 121 simulations. Although our technique can recover any rotation angle, since Vandewalle’s method [121] recovers only angles in the range $[-30,30]$ , in order to be fair, we compared our results for every $0.5^{\circ}$ in that range.

We also ran our scale recovery method for 50 images with scale amounts ${1/4,1/2,1,2,4}$ . All experiments returned the exact scale in under $0.09$ seconds. Since wavelet transform downsamples images by $2$ in every level, we can only recover scales that are multiples of $2$ .

Results obtained for combination of transformations can be seen in Tables IV and V. While Table IV shows comparisons to Vandewalle’s method for rotation and translation, Table V presents our results obtained for several combinations of scale, rotation and translation. These tables also confirm that our method is accurate and outperforms or at least matches state-of-the-art.

V-B Optimal Parameters

In order to find the appropriate constants $\tau$ and $k$ for translational shift, which are the measures of accuracy (tolerance for cross correlation function) and reduction level of Haar transform, respectively, and show the accuracy of the proposed method, we tested our algorithm with 50 simulated test images for shift amount $(0.33,-0.33)$ . Results after removing the outliers (when a local maxima is reached) are shown in Fig. 2, where $PSNR=Inf$ is demonstrated as $100$ . As seen in the figure, the constants $\tau$ and $k$ can be adapted depending on the trade-off between time complexity and PSNR.

In case of the most general similarity transformation, $k$ is decided based on the recovered scale $\hat{\sigma}$ by choosing $k=1$ if $scale<1$ or $k=\hat{\sigma}+1$ otherwise.

V-C Validation on Real Data

In order to ensure the accuracy of our method, real world images were also utilized as input. Results for real world examples (d, e and f in Fig. 1) including comparisons with the state-of-the-art methods [124] and [121] are summarized in Table VI. Since the GT for the used images is not known, the results are compared using PSNR and MSE as it is common practice in the literature. All methods are given the same input, where smaller image regions are used to adopt image sizes to work with our method as described in Section IV-C. As seen in Table VI, our method outperforms the baseline methods in real world examples in most cases as well.

V-D The Effect of Noise and Sparseness

Our proposed approaches for scale and rotation estimation already suppress noise by hard wavelet thresholding. Therefore, here we discuss only noise in translation estimation. In Table VII, a comparison of the proposed method with [59] and [125] under noisy conditions is presented. By adapting $\tau$ , based on the level of noise and cross validation, very accurate shift values can be achieved. It can be concluded from Table VII that our method performs well in suppressing Gaussian noise, which also is superior to the state-of-the-art. In order to show the accuracy under noisy conditions, the proposed algorithm is tested for 50 images with 50 different shift amounts for each image, with Gaussian noise. Results, after removing outliers, are shown in Fig. 3 for average SNR with respect to $\tau$ and $\sigma$ .

Since our method works entirely in-band (i.e. using only detail coefficients), the method is particularly applicable to wavelet encoded imaging. Moreover, our approach can work with a sparse subset of coefficients, e.g. compressed sensing of wavelet-encoded images. Since our scale and rotation recovery methods already use sparse coefficients (i.e. hard-thresholded wavelet coefficients), we experimented on translational shifts under sparseness. We tested our method as the level of sparseness varied from 2% to 100% of detail coefficients, for several simulated images and different shifts. We then fitted a model to the average results to evaluate the trend which is shown in Fig. 4-a. It can be noticed that even at very sparse levels of detail coefficients, the method is stable with an average PSNR above 46dB. Beyond 50% sampled detail coefficients, the PSNR grows exponentially.

V-E Computational Complexity and Convergence Rate

Time complexity of our method depends on in-band shifting, parameter selection, and the level of sparseness. In-band shifting method in Section III-B, has a complexity of $O((L/2^{N-k+1})^{2})$ for all $k=1...N$ , where $L$ is size of the image (or a sparsified version). Parameter selection also affects the complexity since when $\tau$ is higher, the method attempts to match the images with higher accuracy, which would increase the run time. We provide running time of our method with comparisons in Tables III, IV and V on a machine with 2.7 GHz CPU and 8 GB RAM.

Fig. 4-b demonstrates the convergence of our method to the global cross-correlation maximum for Lena image with GT $(0.5,0.5)$ in blue circles, Pentagon image with GT $(0.25,0.5)$ in green stars, and Cameraman image with GT $(0.33,-0.33)$ in red line. The convergence is visibly exponential and therefore we get a very rapid convergence to the solution.

VI Conclusion

A sub-pixel registration technique for sparse Haar encoded images is proposed. Only a sparse set of detail coefficients are sufficient to establish the cross-correlation between images for scale, rotation, and translation recovery. Our registration process is thus performed solely in-band, making the method capable of handling both in-band registration for wavelet-encoded imaging systems, and sparsely sensed data for a wavelet-based compressive sensing imager. Moreover, our method conveniently decouples scale, rotation and translation parameters, while exploiting Haar wavelet’s important features, such as multi-resolution representation and signal energy localization. Our method does not use image interpolation for estimating the registration parameters, since the exact set of in-band equations are derived for establishing the registration and fitting the parameters. Although the run time of our method is higher than compared methods, we achieve far better accuracy as a reasonable trade-off. Overall, our results show superior performance, and outperform the baseline methods in terms of accuracy and resilience to noise.

Bibliography126

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. C. Hassan Shekarforoush (Foroosh), “Data-driven multi-channel super-resolution with application to video sequences,” Journal of Optical Society of America-A , vol. 16, no. 3, pp. 481–492, 1999.
2[2] M. W. Hassan Shekarforoush (Foroosh), Marc Berthod and J. Zerubia, “Subpixel bayesian estimation of albedo and height,” International Journal of Computer Vision , vol. 19, no. 3, pp. 289–300, 1996.
3[3] H. Shekarforoush, M. Berthod, and J. Zerubia, “3d super-resolution using generalized sampling expansion,” in Image Processing, 1995. Proceedings., International Conference on , vol. 2, pp. 300–303, IEEE, 1995.
4[4] A. Lorette, H. Shekarforoush, and J. Zerubia, “Super-resolution with adaptive regularization,” in Image Processing, 1997. Proceedings., International Conference on , vol. 1, pp. 169–172, IEEE, 1997.
5[5] H. Shekarforoush, R. Chellappa, H. Niemann, H. Seidel, and B. Girod, “Multi-channel superresolution for images sequences with applications to airborne video data,” Proc. of IEEE Image and Multidimensional Digital Signal Processing , pp. 207–210, 1998.
6[6] M. Berthod, M. Werman, H. Shekarforoush, and J. Zerubia, “Refining depth and luminance information using super-resolution,” in Computer Vision and Pattern Recognition , pp. 654–657, 1994.
7[7] H. Shekarforoush, Conditioning bounds for multi-frame super-resolution algorithms . Computer Vision Laboratory, Center for Automation Research, University of Maryland, 1999.
8[8] A. Jain, S. Murali, N. Papp, K. Thompson, K.-s. Lee, P. Meemon, H. Foroosh, and J. P. Rolland, “Super-resolution imaging combining the design of an optical coherence microscope objective with liquid-lens based dynamic focusing capability and computational methods,” in Optical Engineering+ Applications , pp. 70610 C–70610 C, International Society for Optics and Photonics, 2008.